Knowledge distillation is the process of a student model learning from a larger teacher model. Methods include - supervised finetuning: train on examples generated or annotated by the teacher model - divergence and similarity: reduce divergence between probability distributions or increase similarity between hidden states - reinforcement learning: first train a reward model, then train the student model with reinforcement learning - rank similarity: train on ranking outputs similarly to the teacher