Intro to LLM

Intro to Large Language Models by Andrej Karpathy.

Videos

attention: the video is produced in 2023, so many fields has changed a lot.

Episode 1

Basic Intro

Constitution of LLM: parameters and executing file

eg. llama-2-70b: 140GB parameters and ~500 lines of c code

Training: text -> GPU -> parameter file

Key: Compressing the internet

Next word prediction forces the neural network to learn a lot about the world(extract the features or “understand” the rules?)

How does it work

pre-training -> finetuning -> assistant

1

Labeling is a human-machine collaboration

Future

  • Scaling Laws
  • Agents
  • Multimodality
  • LLM only have system1? “think”->
  • self-improvement (like rl)
  • LLM OS

Security

  • Jailbreak
  • Prompt injection
  • Data poisoning