Long-short term memory models are a type of [[recurrent neural network]]. LSTM uses two paths: a long-term memory path and a short-term memory path to overcome the vanishing or exploding gradient problem. Each LSTM cell takes two inputs: a short-term and long-term memory value. The LSTM cell contains three "gates": - **forget gate**: determines what percentage of the long-term memory is remembered - **input gate**: determines how we should update the long-term memory given the new value - **output gate**: determines how much of the short term memory to pass on the the next LSTM cell The [[sigmoid]] [[activation function]] is used when determining what percentage of memory to remember and the [[tanh]] activation function is used to determine the output. > [!Tip]- Additional Resources > - [Long Short-Term Memory (LSTM), Clearly Explained](https://youtu.be/YCzL96nL7j0?si=MbzQG6U1Xe6ML5kN) | StatQuest with Josh Starmer