Intro to Large Language Models by Andrej Karpathy.
Videos
attention: the video is produced in 2023, so many fields has changed a lot.
Episode 1
Basic Intro
Constitution of LLM: parameters and executing file
eg. llama-2-70b: 140GB parameters and ~500 lines of c code
Training: text -> GPU -> parameter file
Key: Compressing the internet
Next word prediction forces the neural network to learn a lot about the world(extract the features or “understand” the rules?)
How does it work
pre-training -> finetuning -> assistant
Labeling is a human-machine collaboration
Future
- Scaling Laws
- Agents
- Multimodality
- LLM only have system1? “think”->
- self-improvement (like rl)
- LLM OS
Security
- Jailbreak
- Prompt injection
- Data poisoning