Custom Integrations
Axolotl adds custom features through integrations
. They are located within the src/axolotl/integrations
directory.
To enable them, please check the respective documentations.
Cut Cross Entropy
Cut Cross Entropy (CCE) reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.
See https://github.com/apple/ml-cross-entropy
Requirements
- PyTorch 2.4.0 or higher
Installation
Run the following command to install cut_cross_entropy[transformers]
if you don’t have it already.
python scripts/cutcrossentropy_install.py | sh
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/apple/ml-cross-entropy.git@24fbe4b5dab9a6c250a014573613c1890190536c"
Usage
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true
Supported Models
- llama
- llama4_text
- llama4
- mllama
- phi3
- gemma
- gemma2
- gemma3
- gemma3_text
- mistral
- mistral3
- qwen2
- cohere
- cohere2
Citation
@article{wijmans2024cut,
author = {Erik Wijmans and
Brody Huval and
Alexander Hertzberg and
Vladlen Koltun and\"ahenb\"uhl},
Philipp Krtitle = {Cut Your Losses in Large-Vocabulary Language Models},
journal = {arXiv},
year = {2024},
url = {https://arxiv.org/abs/2411.09009},
}
Please see reference here
Grokfast
See https://github.com/ironjr/grokfast
Usage
plugins:
- axolotl.integrations.grokfast.GrokfastPlugin
grokfast_alpha: 2.0
grokfast_lamb: 0.98
Citation
@article{lee2024grokfast,
title={{Grokfast}: Accelerated Grokking by Amplifying Slow Gradients},
author={Lee, Jaerin and Kang, Bong Gyun and Kim, Kihoon and Lee, Kyoung Mu},
journal={arXiv preprint arXiv:2405.20233},
year={2024}
}
Please see reference here
Knowledge Distillation (KD)
Usage
plugins:
- "axolotl.integrations.kd.KDPlugin"
kd_trainer: True
kd_ce_alpha: 0.1
kd_alpha: 0.9
kd_temperature: 1.0
torch_compile: True # torch>=2.5.1, recommended to reduce vram
datasets:
- path: ...
type: "axolotl.integrations.kd.chat_template"
field_messages: "messages_combined"
logprobs_field: "llm_text_generation_vllm_logprobs" # for kd only, field of logprobs
An example dataset can be found at axolotl-ai-co/evolkit-logprobs-pipeline-75k-v2-sample
Please see reference here
Liger Kernels
Liger Kernel provides efficient Triton kernels for LLM training, offering:
- 20% increase in multi-GPU training throughput
- 60% reduction in memory usage
- Compatibility with both FSDP and DeepSpeed
See https://github.com/linkedin/Liger-Kernel
Usage
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true
Supported Models
- deepseek_v2
- gemma
- gemma2
- gemma3 (partial support, no support for FLCE yet)
- granite
- jamba
- llama
- mistral
- mixtral
- mllama
- mllama_text_model
- olmo2
- paligemma
- phi3
- qwen2
- qwen2_5_vl
- qwen2_vl
Citation
@article{hsu2024ligerkernelefficienttriton,
title={Liger Kernel: Efficient Triton Kernels for LLM Training},
author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},
year={2024},
eprint={2410.10989},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.10989},
journal={arXiv preprint arXiv:2410.10989},
}
Please see reference here
Language Model Evaluation Harness (LM Eval)
Run evaluation on model using the popular lm-evaluation-harness library.
See https://github.com/EleutherAI/lm-evaluation-harness
Usage
plugins:
- axolotl.integrations.lm_eval.LMEvalPlugin
lm_eval_tasks:
- gsm8k
- hellaswag
- arc_easy
lm_eval_batch_size: # Batch size for evaluation
output_dir: # Directory to save evaluation results
Citation
@misc{eval-harness,
author = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy},
title = {A framework for few-shot language model evaluation},
month = 07,
year = 2024,
publisher = {Zenodo},
version = {v0.4.3},
doi = {10.5281/zenodo.12608602},
url = {https://zenodo.org/records/12608602}
}
Please see reference here
Spectrum
by Eric Hartford, Lucas Atkins, Fernando Fernandes, David Golchinfar
This plugin contains code to freeze the bottom fraction of modules in a model, based on the Signal-to-Noise Ratio (SNR).
See https://github.com/cognitivecomputations/spectrum
Overview
Spectrum is a tool for scanning and evaluating the Signal-to-Noise Ratio (SNR) of layers in large language models. By identifying the top n% of layers with the highest SNR, you can optimize training efficiency.
Usage
plugins:
- axolotl.integrations.spectrum.SpectrumPlugin
spectrum_top_fraction: 0.5
spectrum_model_name: meta-llama/Meta-Llama-3.1-8B
Citation
@misc{hartford2024spectrumtargetedtrainingsignal,
title={Spectrum: Targeted Training on Signal to Noise Ratio},
author={Eric Hartford and Lucas Atkins and Fernando Fernandes Neto and David Golchinfar},
year={2024},
eprint={2406.06623},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.06623},
}
Please see reference here
Adding a new integration
Plugins can be used to customize the behavior of the training pipeline through hooks. See axolotl.integrations.BasePlugin
for the possible hooks.
To add a new integration, please follow these steps:
- Create a new folder in the
src/axolotl/integrations
directory. - Add any relevant files (
LICENSE
,README.md
,ACKNOWLEDGEMENTS.md
, etc.) to the new folder. - Add
__init__.py
andargs.py
files to the new folder.
__init__.py
should import the integration and hook into the appropriate functions.args.py
should define the arguments for the integration.
- (If applicable) Add CPU tests under
tests/integrations
or GPU tests undertests/e2e/integrations
.
See src/axolotl/integrations/cut_cross_entropy for a minimal integration example.
If you could not load your integration, please ensure you are pip installing in editable mode.
pip install -e .
and correctly spelled the integration name in the config file.
plugins:
- axolotl.integrations.your_integration_name.YourIntegrationPlugin
It is not necessary to place your integration in the integrations
folder. It can be in any location, so long as it’s installed in a package in your python env.
See this repo for an example: https://github.com/axolotl-ai-cloud/diff-transformer