Custom Integrations

Axolotl adds custom features through integrations. They are located within the src/axolotl/integrations directory.

To enable them, please check the respective documentations.

Cut Cross Entropy

Cut Cross Entropy (CCE) reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.

See https://github.com/apple/ml-cross-entropy

Requirements

PyTorch 2.4.0 or higher

Installation

Run the following command to install cut_cross_entropy[transformers] if you don’t have it already.

If you are in dev environment

python scripts/cutcrossentropy_install.py | sh

If you are installing from pip

pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@78b2a45713a54c9bedf8b33f5e31cf07a1a57154"

Usage

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

Supported Models

cohere
cohere2
gemma
gemma2
gemma3
gemma3_text
glm
glm4
llama
llama4
llama4_text
mistral
mistral3
mllama
phi
phi3
phi4_multimodal
qwen2
qwen2_vl
qwen2_moe
qwen2_5_vl
qwen3
qwen3_moe

Citation

@article{wijmans2024cut,
  author       = {Erik Wijmans and
                  Brody Huval and
                  Alexander Hertzberg and
                  Vladlen Koltun and
                  Philipp Kr\"ahenb\"uhl},
  title        = {Cut Your Losses in Large-Vocabulary Language Models},
  journal      = {arXiv},
  year         = {2024},
  url          = {https://arxiv.org/abs/2411.09009},
}

Please see reference here

Grokfast

See https://github.com/ironjr/grokfast

Usage

plugins:
  - axolotl.integrations.grokfast.GrokfastPlugin

grokfast_alpha: 2.0
grokfast_lamb: 0.98

Citation

@article{lee2024grokfast,
    title={{Grokfast}: Accelerated Grokking by Amplifying Slow Gradients},
    author={Lee, Jaerin and Kang, Bong Gyun and Kim, Kihoon and Lee, Kyoung Mu},
    journal={arXiv preprint arXiv:2405.20233},
    year={2024}
}

Please see reference here

Knowledge Distillation (KD)

Usage

plugins:
  - "axolotl.integrations.kd.KDPlugin"

kd_trainer: True
kd_ce_alpha: 0.1
kd_alpha: 0.9
kd_temperature: 1.0

torch_compile: True  # torch>=2.5.1, recommended to reduce vram

datasets:
  - path: ...
    type: "axolotl.integrations.kd.chat_template"
    field_messages: "messages_combined"
    logprobs_field: "llm_text_generation_vllm_logprobs"  # for kd only, field of logprobs

An example dataset can be found at axolotl-ai-co/evolkit-logprobs-pipeline-75k-v2-sample

Please see reference here

Liger Kernels

Liger Kernel provides efficient Triton kernels for LLM training, offering:

20% increase in multi-GPU training throughput
60% reduction in memory usage
Compatibility with both FSDP and DeepSpeed

See https://github.com/linkedin/Liger-Kernel

Usage

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true

Supported Models

deepseek_v2
gemma
gemma2
gemma3
granite
jamba
llama
mistral
mixtral
mllama
mllama_text_model
olmo2
paligemma
phi3
qwen2
qwen2_5_vl
qwen2_vl

Citation

@article{hsu2024ligerkernelefficienttriton,
      title={Liger Kernel: Efficient Triton Kernels for LLM Training},
      author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},
      year={2024},
      eprint={2410.10989},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.10989},
      journal={arXiv preprint arXiv:2410.10989},
}

Please see reference here

Language Model Evaluation Harness (LM Eval)

Run evaluation on model using the popular lm-evaluation-harness library.

See https://github.com/EleutherAI/lm-evaluation-harness

Usage

plugins:
  - axolotl.integrations.lm_eval.LMEvalPlugin

lm_eval_tasks:
  - gsm8k
  - hellaswag
  - arc_easy

lm_eval_batch_size: # Batch size for evaluation
output_dir: # Directory to save evaluation results

Citation

@misc{eval-harness,
  author       = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy},
  title        = {A framework for few-shot language model evaluation},
  month        = 07,
  year         = 2024,
  publisher    = {Zenodo},
  version      = {v0.4.3},
  doi          = {10.5281/zenodo.12608602},
  url          = {https://zenodo.org/records/12608602}
}

Please see reference here

Spectrum

by Eric Hartford, Lucas Atkins, Fernando Fernandes, David Golchinfar

This plugin contains code to freeze the bottom fraction of modules in a model, based on the Signal-to-Noise Ratio (SNR).

See https://github.com/cognitivecomputations/spectrum

Overview

Spectrum is a tool for scanning and evaluating the Signal-to-Noise Ratio (SNR) of layers in large language models. By identifying the top n% of layers with the highest SNR, you can optimize training efficiency.

Usage

plugins:
  - axolotl.integrations.spectrum.SpectrumPlugin

spectrum_top_fraction: 0.5
spectrum_model_name: meta-llama/Meta-Llama-3.1-8B

Citation

@misc{hartford2024spectrumtargetedtrainingsignal,
      title={Spectrum: Targeted Training on Signal to Noise Ratio},
      author={Eric Hartford and Lucas Atkins and Fernando Fernandes Neto and David Golchinfar},
      year={2024},
      eprint={2406.06623},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.06623},
}

Please see reference here

LLMCompressor

Fine-tune sparsified models in Axolotl using Neural Magic’s LLMCompressor.

This integration enables fine-tuning of models sparsified using LLMCompressor within the Axolotl training framework. By combining LLMCompressor’s model compression capabilities with Axolotl’s distributed training pipelines, users can efficiently fine-tune sparse models at scale.

It uses Axolotl’s plugin system to hook into the fine-tuning flows while maintaining sparsity throughout training.

Requirements

Axolotl with llmcompressor extras:
```
pip install "axolotl[llmcompressor]"
```
Requires llmcompressor >= 0.5.1

This will install all necessary dependencies to fine-tune sparsified models using the integration.

Usage

To enable sparse fine-tuning with this integration, include the plugin in your Axolotl config:

plugins:
  - axolotl.integrations.llm_compressor.LLMCompressorPlugin

llmcompressor:
  recipe:
    finetuning_stage:
      finetuning_modifiers:
        ConstantPruningModifier:
          targets: [
            're:.*q_proj.weight',
            're:.*k_proj.weight',
            're:.*v_proj.weight',
            're:.*o_proj.weight',
            're:.*gate_proj.weight',
            're:.*up_proj.weight',
            're:.*down_proj.weight',
          ]
          start: 0
  save_compressed: true

This plugin does not apply pruning or sparsification itself — it is intended for fine-tuning models that have already been sparsified.

Pre-sparsified checkpoints can be: - Generated using LLMCompressor - Downloaded from Neural Magic’s Hugging Face page - Any custom LLM with compatible sparsity patterns that you’ve created yourself

To learn more about writing and customizing LLMCompressor recipes, refer to the official documentation: https://github.com/vllm-project/llm-compressor/blob/main/README.md

Storage Optimization with save_compressed

Setting save_compressed: true in your configuration enables saving models in a compressed format, which: - Reduces disk space usage by approximately 40% - Maintains compatibility with vLLM for accelerated inference - Maintains compatibility with llmcompressor for further optimization (example: quantization)

This option is highly recommended when working with sparse models to maximize the benefits of model compression.

Example Config

See examples/llama-3/sparse-finetuning.yaml for a complete example.

Inference with vLLM

After fine-tuning your sparse model, you can leverage vLLM for efficient inference. You can also use LLMCompressor to apply additional quantization to your fine-tuned sparse model before inference for even greater performance benefits.:

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM("path/to/your/sparse/model")
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

For more details on vLLM’s capabilities and advanced configuration options, see the official vLLM documentation.

Learn More

For details on available sparsity and quantization schemes, fine-tuning recipes, and usage examples, visit the official LLMCompressor repository:

https://github.com/vllm-project/llm-compressor

Please see reference here

Adding a new integration

Plugins can be used to customize the behavior of the training pipeline through hooks. See axolotl.integrations.BasePlugin for the possible hooks.

To add a new integration, please follow these steps:

Create a new folder in the src/axolotl/integrations directory.
Add any relevant files (LICENSE, README.md, ACKNOWLEDGEMENTS.md, etc.) to the new folder.
Add __init__.py and args.py files to the new folder.

__init__.py should import the integration and hook into the appropriate functions.
args.py should define the arguments for the integration.

(If applicable) Add CPU tests under tests/integrations or GPU tests under tests/e2e/integrations.

Tip

See src/axolotl/integrations/cut_cross_entropy for a minimal integration example.

Warning

If you could not load your integration, please ensure you are pip installing in editable mode.

pip install -e .

and correctly spelled the integration name in the config file.

plugins:
  - axolotl.integrations.your_integration_name.YourIntegrationPlugin

Note

It is not necessary to place your integration in the integrations folder. It can be in any location, so long as it’s installed in a package in your python env.

See this repo for an example: https://github.com/axolotl-ai-cloud/diff-transformer