Low Rank Adaptation (LoRA) is a reparameterization technique for fine tuning that utilizes low rank adaptor matrices to update target modules in a pretrained model.
1. Freeze the weights (skip the forward pass)
2. Instead, pick a few layers to target, called target modules
3. Create new matrices called adaptor matrices with fewer dimensions (lower rank); there will be two per target module: one to down-scale and one to scale back to original dimension.
4. Use the low rank adaptor matrices to update the target modules
### hyperparameters
- **Rank (r)**: how many dimensions in the low-rank matrices. Start with 8 and double until diminishing returns observed
- **Alpha ($\alpha$)**: learning rate. Start with $\alpha = 2r$.
- **Target Modules**: which layers will be adapted in the pretrained models. Typically target the attention layers but it depends on the task.
- **Quantization**: amount of [[quantization]] to use.
- **Dropout**: amount of [[dropout]] to reduce overfitting.
Epochs: number of times to iterate through the entire data set
Batch size: number of data points to forward pass for each backpropagation
Learning rate: size of the step to take in gradient descent. Can be scheduled to lower as a minima is approached
Gradient accumulation: number of forward passes before backpropagation (similar to batch size)
Optimizer: algorithm for backprop (e.g., Adam)
To use LoRA in [[Python]], install the `peft` package. To know which target modules to target, simply print the model to screen to review its architecture.
```python
from peft import LoraConfig, PeftModel
BASE_MODEL = "meta-lllama/Meta-Llama-3.1-8B"
LORA_R = 32
LORA_ALPHA = 64
TARGET_MODULES = ["q_proj", "v_proj", "k_proj", "o_proj"]
# Login to HuggingFace
fine_tuned
```
The above architecture reduces the number of trainable parameters to 27 million and the training size to 109MB.
When adding quantization to LoRA also, the process is called QLoRA.