notes

writing about language model training

2026 rl

Can an LLM Learn to Tell Other LLMs Apart? An RL Journey

Teaching Qwen3.5-9B to discover, in its own words, what makes Claude, ChatGPT, and Gemini write the way they do.
2026 post-training

The Imitation Game: State of Policy Distillation in Language Model Training

A survey of on-policy distillation in 2026 — why it works, when it fails, and open problems.
2026 architectures

Slowrun and Gated Delta Net

Hybrid attention, linear-attention recurrences, and gated delta networks.
2026 training

How your choice of Optimisers affects training (and why you should care)

Walk through SGD, momentum, Adam, AdamW, Lion on toy landscapes and CIFAR-10.
2025 rl

How reward structure affects LLM finetuning for domain specific tasks

Hard rewards, soft rewards, and what GRPO actually does to your model.

~  ~  ~   ~   ~  ~  ~

© chinmay karkar