Autoregressive generation is the process of generating a random sequence that reflects the distribution of some data, which is to say using the previous predictions in the next prediction. This is the basis of most [[generative AI]] applications today. The concept is attributed to [[Claude Shannon]] in the 1950s from his work on [[information theory]] and especially the notion of [[entropy]]. An autoregressive model is a model for which the prediction for $x$ depends on all of the preceding points. $x(t) = f(x_{(t-n)})$ for all $n$. Contrast with a regressive model that predicts a point $y$ based on other inputs $x$. A [[large language model]] is autoregressive because the next word it predicts is based on all of the preceding words predicted. GPT is an autoregressive language model.