FSDP + QLoRA
Background
Using FSDP with QLoRA is essential for fine-tuning larger (70b+ parameter) LLMs on consumer GPUs. For example, you can use FSDP + QLoRA to train a 70b model on two 24GB GPUs1.
Below, we describe how to use this feature in Axolotl.
Usage
To enable QLoRA
with FSDP
, you need to perform the following steps:
![Tip] See the example config file in addition to reading these instructions.
- Set
adapter: qlora
in your axolotl config file. - Enable FSDP in your axolotl config, as described here.
- Use one of the supported model types:
llama
,mistral
ormixtral
.
Enabling Swap for FSDP2
If available memory is insufficient even after FSDP’s CPU offloading, you can enable swap memory usage by setting cpu_offload_pin_memory: false
alongside offload_params: true
in FSDP config.
This disables memory pinning, allowing FSDP to use disk swap space as fallback. Disabling memory pinning itself incurs performance overhead, and actually having to use swap adds more, but it may enable training larger models that would otherwise cause OOM errors on resource constrained systems.
Example Config
examples/llama-2/qlora-fsdp.yml contains an example of how to enable QLoRA + FSDP in axolotl.