As LLMs grow larger, full fine-tuning becomes increasingly impractical:
Updating all parameters requires storing optimizer states for every parameter (~3-4x model size)
Training a 7B parameter model can cost $1,000+ for a single run
Each fine-tuned model is a complete copy (7B-175B parameters)
Managing multiple large model versions becomes unwieldy
Parameter-Efficient Fine-Tuning (PEFT) methods train only a small subset of parameters while keeping most of the pre-trained model frozen.
Efficiently Updating Parameters
Often only 1% or less of model parameters need to be updated to achieve comparable performance to full fine-tuning
Adds small trainable low-rank matrices alongside frozen pre-trained weights
Insert small trainable modules between layers of the pre-trained model
Add trainable continuous embedding vectors to the input sequence
Add trainable prefix parameters to each transformer layer
Combines quantization with LoRA for even greater efficiency
LoRA represents weight updates using low-rank decomposition:
W = W0 + ΔW
ΔW = BA
where B ∈ ℝd×r, A ∈ ℝr×k, and r << min(d,k)
Key Parameters:
Method | Parameter Efficiency | Memory Usage | Training Speed | Performance |
---|---|---|---|---|
LoRA | Very High | Very Low | Fast | Excellent |
Adapters | High | Low | Medium | Good |
Prompt Tuning | Extremely High | Extremely Low | Very Fast | Variable |
QLoRA | Very High | Extremely Low | Medium | Excellent |