A decoder takes encoded information and generates an output sequence (e.g., translation, summary). In [[autoregressive generation]], the decoder generates one token at a time while conditioning on previously generated tokens.