Parameter-Efficient Fine-tuning

12/30

As LLMs grow larger, full fine-tuning becomes increasingly impractical:

Memory Requirements
Updating all parameters requires storing optimizer states for every parameter (~3-4x model size)
Compute Costs
Training a 7B parameter model can cost $1,000+ for a single run
Storage Overhead
Each fine-tuned model is a complete copy (7B-175B parameters)
Deployment Complexity
Managing multiple large model versions becomes unwieldy

Parameter-Efficient Fine-Tuning (PEFT) methods train only a small subset of parameters while keeping most of the pre-trained model frozen.

Efficiently Updating Parameters

Often only 1% or less of model parameters need to be updated to achieve comparable performance to full fine-tuning

Adds small trainable low-rank matrices alongside frozen pre-trained weights

Key idea: Approximate weight updates with low-rank matrices (r << d)

Insert small trainable modules between layers of the pre-trained model

Key idea: Down-project, apply non-linearity, then up-project back to original dimension

Add trainable continuous embedding vectors to the input sequence

Key idea: Learn soft prompts that shape model behavior without changing model parameters

Add trainable prefix parameters to each transformer layer

Key idea: Task-specific activations at each layer steer the model's behavior

Combines quantization with LoRA for even greater efficiency

Key idea: 4-bit quantization of base model with LoRA adapter training

LoRA represents weight updates using low-rank decomposition:

W = W₀ + ΔW

ΔW = BA

where B ∈ ℝ^d×r, A ∈ ℝ^r×k, and r << min(d,k)

Key Parameters:

Query/Key/Value matrices: Most commonly adapted in attention layers

Feed-forward projections: Often targeted in MLP layers

Output projections: Attention output and layer connections

With LoRA, you can fine-tune a 7B parameter model on a single consumer GPU with 16GB VRAM!

Method	Parameter Efficiency	Memory Usage	Training Speed	Performance
LoRA	Very High	Very Low	Fast	Excellent
Adapters	High	Low	Medium	Good
Prompt Tuning	Extremely High	Extremely Low	Very Fast	Variable
QLoRA	Very High	Extremely Low	Medium	Excellent