The Generative Pretrained Transformer (GPT) is a [[large language model]] based on the [[transformer]] architecture.
The first GPT was introduced by Radford et al. in 2018. GPT-2 and GPT-3 were released in 2019 and 2020, respectively. The next generation model was called ChatGPT and was released publicly in 2022 by OpenAI. GPT-4 debuted in 2024. All of these models were primarily improved by scaling the number of parameters and the size of the training set.
## positional encoding
Positional encoding translates the position of each token in the sequence into a numeric value using sine and cosine functions. These are added to the embedding values to get the position encoded embedding.
## self-attention
GPTs use self-attention, a type of [[base/Deep Learning/attention|attention]] mechanism, to associate related words.
> [!Tip]- Additional Resources
> - [Transformer Neural Networks, CHatGPT's foundation, Clearly Explained!!!](https://youtu.be/zxQyTK8quyY?si=OHEixssBYZBFmQUJ) | StatQuest with Josh Starmer
> - [Let's Build: GPT from scratch, in code, spelled out.](https://youtu.be/kCc8FmEb1nY?si=AgA1roxfzvXZoDDI) | Andrej Karpathy