Long-short term memory models are a type of [[recurrent neural network]]. LSTM uses two paths: a long-term memory path and a short-term memory path to overcome the vanishing or exploding gradient problem.
Each LSTM cell takes two inputs: a short-term and long-term memory value. The LSTM cell contains three "gates":
- **forget gate**: determines what percentage of the long-term memory is remembered
- **input gate**: determines how we should update the long-term memory given the new value
- **output gate**: determines how much of the short term memory to pass on the the next LSTM cell
The [[sigmoid]] [[activation function]] is used when determining what percentage of memory to remember and the [[tanh]] activation function is used to determine the output.
> [!Tip]- Additional Resources
> - [Long Short-Term Memory (LSTM), Clearly Explained](https://youtu.be/YCzL96nL7j0?si=MbzQG6U1Xe6ML5kN) | StatQuest with Josh Starmer