Intro to LLM

Intro to Large Language Models by Andrej Karpathy.

Videos

attention: the video is produced in 2023, so many fields has changed a lot.

Episode 1

Constitution of LLM: parameters and executing file

eg. llama-2-70b: 140GB parameters and ~500 lines of c code

Training: text -> GPU -> parameter file

Key: Compressing the internet

Next word prediction forces the neural network to learn a lot about the world(extract the features or “understand” the rules?)

pre-training -> finetuning -> assistant

Labeling is a human-machine collaboration