FSDP + QLoRA

Use FSDP with QLoRA to fine-tune large LLMs on consumer GPUs.

Background

Using FSDP with QLoRA is essential for fine-tuning larger (70b+ parameter) LLMs on consumer GPUs. For example, you can use FSDP + QLoRA to train a 70b model on two 24GB GPUs1.

Below, we describe how to use this feature in Axolotl.

Usage

To enable QLoRA with FSDP, you need to perform the following steps:

![Tip] See the example config file in addition to reading these instructions.

  1. Set adapter: qlora in your axolotl config file.
  2. Enable FSDP in your axolotl config, as described here.
  3. Use one of the supported model types: llama, mistral or mixtral.

Enabling Swap for FSDP2

If available memory is insufficient even after FSDP’s CPU offloading, you can enable swap memory usage by setting cpu_offload_pin_memory: false alongside offload_params: true in FSDP config.

This disables memory pinning, allowing FSDP to use disk swap space as fallback. Disabling memory pinning itself incurs performance overhead, and actually having to use swap adds more, but it may enable training larger models that would otherwise cause OOM errors on resource constrained systems.

Example Config

examples/llama-2/qlora-fsdp.yml contains an example of how to enable QLoRA + FSDP in axolotl.

References

Footnotes

  1. This was enabled by this work from the Answer.AI team.↩︎