notes
writing about language model training
-
The Imitation Game: State of Policy Distillation in Language Model Training
A survey of on-policy distillation in 2026 — why it works, when it fails, and open problems.
-
Slowrun and Gated Delta Net
Hybrid attention, linear-attention recurrences, and gated delta networks.
-
How your choice of Optimisers affects training (and why you should care)
Walk through SGD, momentum, Adam, AdamW, Lion on toy landscapes and CIFAR-10.
-
How reward structure affects LLM finetuning for domain specific tasks
Hard rewards, soft rewards, and what GRPO actually does to your model.