LoRA

Low Rank Adaptation (LoRA) is a reparameterization technique for fine tuning that utilizes low rank adaptor matrices to update target modules in a pretrained model. 1. Freeze the weights (skip the forward pass) 2. Instead, pick a few layers to target, called target modules 3. Create new matrices called adaptor matrices with fewer dimensions (lower rank); there will be two per target module: one to down-scale and one to scale back to original dimension. 4. Use the low rank adaptor matrices to update the target modules ### hyperparameters - **Rank (r)**: how many dimensions in the low-rank matrices. Start with 8 and double until diminishing returns observed - **Alpha ($\alpha$)**: learning rate. Start with $\alpha = 2r$. - **Target Modules**: which layers will be adapted in the pretrained models. Typically target the attention layers but it depends on the task. - **Quantization**: amount of [[quantization]] to use. - **Dropout**: amount of [[dropout]] to reduce overfitting. Epochs: number of times to iterate through the entire data set Batch size: number of data points to forward pass for each backpropagation Learning rate: size of the step to take in gradient descent. Can be scheduled to lower as a minima is approached Gradient accumulation: number of forward passes before backpropagation (similar to batch size) Optimizer: algorithm for backprop (e.g., Adam) To use LoRA in [[Python]], install the `peft` package. To know which target modules to target, simply print the model to screen to review its architecture. ```python from peft import LoraConfig, PeftModel BASE_MODEL = "meta-lllama/Meta-Llama-3.1-8B" LORA_R = 32 LORA_ALPHA = 64 TARGET_MODULES = ["q_proj", "v_proj", "k_proj", "o_proj"] # Login to HuggingFace fine_tuned ``` The above architecture reduces the number of trainable parameters to 27 million and the training size to 109MB. When adding quantization to LoRA also, the process is called QLoRA.