Custom Integrations
Axolotl adds custom features through integrations. They are located within the src/axolotl/integrations directory.
To enable them, please check the respective documentations.
Cut Cross Entropy
Cut Cross Entropy (CCE) reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.
See https://github.com/apple/ml-cross-entropy
Requirements
- PyTorch 2.4.0 or higher
Installation
Run the following command to install cut_cross_entropy[transformers] if you don’t have it already.
- If you are in dev environment
python scripts/cutcrossentropy_install.py | sh- If you are installing from pip
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@318b7e2"Usage
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPluginSupported Models
- apertus
- arcee
- cohere
- cohere2
- deepseek_v3
- gemma
- gemma2
- gemma3
- gemma3_text
- gemma3n
- gemma3n_text
- glm
- glm4
- glm4_moe
- glm4v
- glm4v_moe
- gpt_oss
- granite
- granitemoe
- granitemoeshared
- granitemoehybrid
- hunyuan_v1_dense
- hunyuan_v1_moe
- internvl
- kimi_linear
- lfm2
- lfm2_moe
- lfm2_vl
- llama
- llama4
- llama4_text
- llava
- ministral
- ministral3
- mistral
- mistral3
- mixtral
- mllama
- olmo
- olmo2
- olmo3
- phi
- phi3
- phi4_multimodal
- qwen2
- qwen2_vl
- qwen2_moe
- qwen2_5_vl
- qwen3
- qwen3_moe
- qwen3_vl
- qwen3_vl_moe
- qwen3_next
- smollm3
- seed_oss
- voxtral
Citation
@article{wijmans2024cut,
author = {Erik Wijmans and
Brody Huval and
Alexander Hertzberg and
Vladlen Koltun and
Philipp Kr\"ahenb\"uhl},
title = {Cut Your Losses in Large-Vocabulary Language Models},
journal = {arXiv},
year = {2024},
url = {https://arxiv.org/abs/2411.09009},
}Please see reference here
DenseMixer
See DenseMixer
Simply add the following to your axolotl YAML config:
plugins:
- axolotl.integrations.densemixer.DenseMixerPluginPlease see reference here
Diffusion LM Training Plugin for Axolotl
This plugin enables diffusion language model training using an approach inspired by LLaDA (Large Language Diffusion Models) within Axolotl.
Overview
LLaDA is a diffusion-based approach to language model training that uses: - Random token masking during training instead of next-token prediction - Bidirectional attention to allow the model to attend to the full context - Importance weighting based on masking probabilities for stable training
This approach can lead to more robust language models with better understanding of bidirectional context.
Installation
The plugin is included with Axolotl. See our installation docs.
Quickstart
Train with an example config (Llama‑3.2 1B):
- Pretrain: axolotl train examples/llama-3/diffusion-3.2-1b-pretrain.yaml
- SFT: axolotl train examples/llama-3/diffusion-3.2-1b-sft.yaml
Basic Configuration
You can also modify your existing configs to enable / customize diffusion training.
Add the following to your Axolotl config:
plugins:
- axolotl.integrations.diffusion.DiffusionPluginAnd, configure the nested diffusion block (defaults shown):
diffusion:
noise_schedule: linear # or "cosine"
min_mask_ratio: 0.1
max_mask_ratio: 0.9
num_diffusion_steps: 128
eps: 1e-3
importance_weighting: true
# Mask token (training auto-adds if missing, avoid pad/eos)
mask_token_str: "<|diffusion_mask|>"
# Or use an existing special token id (e.g., 128002 for Llama-3.x)
# mask_token_id: 128002
# Sample generation during training (optional)
generate_samples: true
generation_interval: 100
num_generation_samples: 3
generation_steps: 128
generation_temperature: 0.0
generation_max_length: 100Supported Models
Any models that support 4D attention masks should work out of the box. If not, please create an issue or open a PR!
How It Works
Random Masking
During training, tokens are randomly masked:
- Sample timestep t uniformly from [0, 1]
- Calculate masking probability: p = (1 - eps) * t + eps
- Randomly mask tokens with probability p
Diffusion Loss
Loss is computed only on masked tokens with (optional) importance weighting:
loss = sum(cross_entropy(pred, target) / p_mask) / total_tokensSample Generation
When diffusion.generate_samples: true, the plugin generates samples during training:
Sample 1:
Original (45 tokens): The quick brown fox jumps over the lazy dog...
Masked (18/45 tokens, 40.0%): The [MASK] [MASK] fox [MASK] over [MASK] lazy [MASK]...
Generated: The quick brown fox jumps over the lazy dog...
Samples are logged to console and wandb (if enabled).
Inference
Diffusion inference is integrated into the standard Axolotl CLI. Use the same config you trained with and run:
axolotl inference path/to/your-config.yaml
Optionally, pass --gradio to use a simple web interface.
Interactive controls (prefix the prompt with commands):
- :complete N → completion mode with N new masked tokens appended (default 64)
- :mask R → random masking mode with target mask ratio R in [0.0, 1.0]
Example session:
================================================================================
Commands:
:complete N -> completion mode with N tokens (default 64)
:mask R -> random masking with ratio R (0.0–1.0)
================================================================================
Give me an instruction (Ctrl + D to submit):
:mask 0.4 The quick brown fox jumps over the lazy dog
Masked (40.0%):
The [MASK] brown [MASK] jumps over the [MASK] dog
Generated:
The quick brown fox jumps over the loud dog
Metrics and Monitoring
The plugin adds (or modifies) several metrics to track diffusion training:
train/loss: Weighted diffusion losstrain/accuracy: Accuracy on masked tokenstrain/mask_ratio: Average fraction of tokens maskedtrain/num_masked_tokens: Number of tokens maskedtrain/avg_p_mask: Average masking probabilitytrain/ce_loss: Unweighted cross-entropy losstrain/importance_weight_avg: Average importance weight
Limitations
- No flash attention support
- No RL training support
References
Please see reference here
Grokfast
See https://github.com/ironjr/grokfast
Usage
plugins:
- axolotl.integrations.grokfast.GrokfastPlugin
grokfast_alpha: 2.0
grokfast_lamb: 0.98Citation
@article{lee2024grokfast,
title={{Grokfast}: Accelerated Grokking by Amplifying Slow Gradients},
author={Lee, Jaerin and Kang, Bong Gyun and Kim, Kihoon and Lee, Kyoung Mu},
journal={arXiv preprint arXiv:2405.20233},
year={2024}
}Please see reference here
Knowledge Distillation (KD)
Usage
plugins:
- "axolotl.integrations.kd.KDPlugin"
kd_trainer: True
kd_ce_alpha: 0.1
kd_alpha: 0.9
kd_temperature: 1.0
torch_compile: True # torch>=2.6.0, recommended to reduce vram
datasets:
- path: ...
type: "axolotl.integrations.kd.chat_template"
field_messages: "messages_combined"
logprobs_field: "llm_text_generation_vllm_logprobs" # for kd only, field of logprobsAn example dataset can be found at axolotl-ai-co/evolkit-logprobs-pipeline-75k-v2-sample
Please see reference here
LLMCompressor
Fine-tune sparsified models in Axolotl using Neural Magic’s LLMCompressor.
This integration enables fine-tuning of models sparsified using LLMCompressor within the Axolotl training framework. By combining LLMCompressor’s model compression capabilities with Axolotl’s distributed training pipelines, users can efficiently fine-tune sparse models at scale.
It uses Axolotl’s plugin system to hook into the fine-tuning flows while maintaining sparsity throughout training.
Requirements
Axolotl with
llmcompressorextras:pip install "axolotl[llmcompressor]"Requires
llmcompressor >= 0.5.1
This will install all necessary dependencies to fine-tune sparsified models using the integration.
Usage
To enable sparse fine-tuning with this integration, include the plugin in your Axolotl config:
plugins:
- axolotl.integrations.llm_compressor.LLMCompressorPlugin
llmcompressor:
recipe:
finetuning_stage:
finetuning_modifiers:
ConstantPruningModifier:
targets: [
're:.*q_proj.weight',
're:.*k_proj.weight',
're:.*v_proj.weight',
're:.*o_proj.weight',
're:.*gate_proj.weight',
're:.*up_proj.weight',
're:.*down_proj.weight',
]
start: 0
save_compressed: trueThis plugin does not apply pruning or sparsification itself — it is intended for fine-tuning models that have already been sparsified.
Pre-sparsified checkpoints can be: - Generated using LLMCompressor - Downloaded from Neural Magic’s Hugging Face page - Any custom LLM with compatible sparsity patterns that you’ve created yourself
To learn more about writing and customizing LLMCompressor recipes, refer to the official documentation: https://github.com/vllm-project/llm-compressor/blob/main/README.md
Storage Optimization with save_compressed
Setting save_compressed: true in your configuration enables saving models in a compressed format, which:
- Reduces disk space usage by approximately 40%
- Maintains compatibility with vLLM for accelerated inference
- Maintains compatibility with llmcompressor for further optimization (example: quantization)
This option is highly recommended when working with sparse models to maximize the benefits of model compression.
Example Config
See examples/llama-3/sparse-finetuning.yaml for a complete example.
Inference with vLLM
After fine-tuning your sparse model, you can leverage vLLM for efficient inference. You can also use LLMCompressor to apply additional quantization to your fine-tuned sparse model before inference for even greater performance benefits.:
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM("path/to/your/sparse/model")
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")For more details on vLLM’s capabilities and advanced configuration options, see the official vLLM documentation.
Learn More
For details on available sparsity and quantization schemes, fine-tuning recipes, and usage examples, visit the official LLMCompressor repository:
https://github.com/vllm-project/llm-compressor
Please see reference here
Language Model Evaluation Harness (LM Eval)
Run evaluation on model using the popular lm-evaluation-harness library.
See https://github.com/EleutherAI/lm-evaluation-harness
Usage
plugins:
- axolotl.integrations.lm_eval.LMEvalPlugin
lm_eval_tasks:
- gsm8k
- hellaswag
- arc_easy
lm_eval_batch_size: # Batch size for evaluation
output_dir: # Directory to save evaluation resultsCitation
@misc{eval-harness,
author = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy},
title = {A framework for few-shot language model evaluation},
month = 07,
year = 2024,
publisher = {Zenodo},
version = {v0.4.3},
doi = {10.5281/zenodo.12608602},
url = {https://zenodo.org/records/12608602}
}Please see reference here
Liger Kernels
Liger Kernel provides efficient Triton kernels for LLM training, offering:
- 20% increase in multi-GPU training throughput
- 60% reduction in memory usage
- Compatibility with both FSDP and DeepSpeed
See https://github.com/linkedin/Liger-Kernel
Usage
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true
liger_use_token_scaling: trueSupported Models
- deepseek_v2
- gemma
- gemma2
- gemma3
- granite
- jamba
- llama
- mistral
- mixtral
- mllama
- mllama_text_model
- olmo2
- paligemma
- phi3
- qwen2
- qwen2_5_vl
- qwen2_vl
Citation
@article{hsu2024ligerkernelefficienttriton,
title={Liger Kernel: Efficient Triton Kernels for LLM Training},
author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},
year={2024},
eprint={2410.10989},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.10989},
journal={arXiv preprint arXiv:2410.10989},
}Please see reference here
Spectrum
by Eric Hartford, Lucas Atkins, Fernando Fernandes, David Golchinfar
This plugin contains code to freeze the bottom fraction of modules in a model, based on the Signal-to-Noise Ratio (SNR).
See https://github.com/cognitivecomputations/spectrum
Overview
Spectrum is a tool for scanning and evaluating the Signal-to-Noise Ratio (SNR) of layers in large language models. By identifying the top n% of layers with the highest SNR, you can optimize training efficiency.
Usage
plugins:
- axolotl.integrations.spectrum.SpectrumPlugin
spectrum_top_fraction: 0.5
spectrum_model_name: meta-llama/Meta-Llama-3.1-8BCitation
@misc{hartford2024spectrumtargetedtrainingsignal,
title={Spectrum: Targeted Training on Signal to Noise Ratio},
author={Eric Hartford and Lucas Atkins and Fernando Fernandes Neto and David Golchinfar},
year={2024},
eprint={2406.06623},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.06623},
}Please see reference here
SwanLab Integration for Axolotl
SwanLab is an open-source, lightweight AI experiment tracking and visualization tool that provides a platform for tracking, recording, comparing, and collaborating on experiments.
This integration enables seamless experiment tracking and visualization of Axolotl training runs using SwanLab.
Features
- 📊 Automatic Metrics Logging: Training loss, learning rate, and other metrics are automatically logged
- 🎯 Hyperparameter Tracking: Model configuration and training parameters are tracked
- 📈 Real-time Visualization: Monitor training progress in real-time through SwanLab dashboard
- ☁️ Cloud & Local Support: Works in both cloud-synced and offline modes
- 🔄 Experiment Comparison: Compare multiple training runs easily
- 🤝 Team Collaboration: Share experiments with team members
- 🎭 RLHF Completion Logging: Automatically log model outputs during DPO/KTO/ORPO/GRPO training for qualitative analysis
- ⚡ Performance Profiling: Built-in profiling decorators to measure and optimize training performance
- 🔔 Lark Notifications: Send real-time training updates to team chat (Feishu/Lark integration)
Installation
pip install swanlabQuick Start
1. Register for SwanLab (Optional for cloud mode)
If you want to use cloud sync features, register at https://swanlab.cn to get your API key.
2. Configure Axolotl Config File
Add SwanLab configuration to your Axolotl YAML config:
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: my-llm-project
swanlab_experiment_name: qwen-finetune-v1
swanlab_mode: cloud # Options: cloud, local, offline, disabled
swanlab_workspace: my-team # Optional: organization name
swanlab_api_key: YOUR_API_KEY # Optional: can also use env var SWANLAB_API_KEY3. Run Training
export SWANLAB_API_KEY=your-api-key-here
swanlab login
accelerate launch -m axolotl.cli.train your-config.yamlConfiguration Options
Basic Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
use_swanlab |
bool | false |
Enable SwanLab tracking |
swanlab_project |
str | None |
Project name (required) |
swanlab_experiment_name |
str | None |
Experiment name |
swanlab_description |
str | None |
Experiment description |
swanlab_mode |
str | cloud |
Sync mode: cloud, local, offline, disabled |
Advanced Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
swanlab_workspace |
str | None |
Workspace/organization name |
swanlab_api_key |
str | None |
API key (prefer env var) |
swanlab_web_host |
str | None |
Private deployment web host |
swanlab_api_host |
str | None |
Private deployment API host |
swanlab_log_model |
bool | false |
Log model checkpoints (coming soon) |
swanlab_lark_webhook_url |
str | None |
Lark (Feishu) webhook URL for team notifications |
swanlab_lark_secret |
str | None |
Lark webhook HMAC secret for authentication |
swanlab_log_completions |
bool | true |
Enable RLHF completion table logging (DPO/KTO/ORPO/GRPO) |
swanlab_completion_log_interval |
int | 100 |
Steps between completion logging |
swanlab_completion_max_buffer |
int | 128 |
Max completions to buffer (memory bound) |
Configuration Examples
Example 1: Basic Cloud Sync
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: llama-finetune
swanlab_experiment_name: llama-3-8b-instruct-v1
swanlab_mode: cloudExample 2: Offline/Local Mode
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: local-experiments
swanlab_experiment_name: test-run-1
swanlab_mode: local # or 'offline'Example 3: Team Workspace
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: research-project
swanlab_experiment_name: experiment-42
swanlab_workspace: my-research-team
swanlab_mode: cloudExample 4: Private Deployment
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: internal-project
swanlab_experiment_name: secure-training
swanlab_mode: cloud
swanlab_web_host: https://swanlab.yourcompany.com
swanlab_api_host: https://api.swanlab.yourcompany.comTeam Notifications with Lark (Feishu)
SwanLab supports sending real-time training notifications to your team chat via Lark (Feishu), ByteDance’s enterprise collaboration platform. This is especially useful for: - Production training monitoring: Get alerts when training starts, completes, or encounters errors - Team collaboration: Keep your ML team informed about long-running experiments - Multi-timezone teams: Team members can check training progress without being online
Prerequisites
- Lark Bot Setup: Create a custom bot in your Lark group chat
- Webhook URL: Get the webhook URL from your Lark bot settings
- HMAC Secret (recommended): Enable signature verification in your Lark bot for security
For detailed Lark bot setup instructions, see Lark Custom Bot Documentation.
Example 5: Basic Lark Notifications
Send training notifications to a Lark group chat:
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: production-training
swanlab_experiment_name: llama-3-finetune-v2
swanlab_mode: cloud
swanlab_lark_webhook_url: https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxxNote: This configuration will work, but you’ll see a security warning recommending HMAC secret configuration.
Example 6: Lark Notifications with HMAC Security (Recommended)
For production use, enable HMAC signature verification:
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: production-training
swanlab_experiment_name: llama-3-finetune-v2
swanlab_mode: cloud
swanlab_lark_webhook_url: https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxx
swanlab_lark_secret: your-webhook-secret-keyWhy HMAC secret matters: - Prevents unauthorized parties from sending fake notifications to your Lark group - Ensures notifications genuinely come from your training jobs - Required for production deployments with sensitive training data
Example 7: Team Workspace + Lark Notifications
Combine team workspace collaboration with Lark notifications:
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: research-project
swanlab_experiment_name: multimodal-experiment-42
swanlab_workspace: ml-research-team
swanlab_mode: cloud
swanlab_lark_webhook_url: https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxx
swanlab_lark_secret: your-webhook-secret-keyWhat Notifications Are Sent?
SwanLab’s Lark integration sends notifications for key training events: - Training Start: When your experiment begins - Training Complete: When training finishes successfully - Training Errors: If training crashes or encounters critical errors - Metric Milestones: Configurable alerts for metric thresholds (if configured in SwanLab)
Each notification includes: - Experiment name and project - Training status - Key metrics (loss, learning rate) - Direct link to SwanLab dashboard
Lark Configuration Validation
The plugin validates your Lark configuration at startup:
✅ Valid Configurations
use_swanlab: true
swanlab_project: my-project
use_swanlab: true
swanlab_project: my-project
swanlab_lark_webhook_url: https://open.feishu.cn/open-apis/bot/v2/hook/xxx
swanlab_lark_secret: your-secret
use_swanlab: true
swanlab_project: my-project
swanlab_lark_webhook_url: https://open.feishu.cn/open-apis/bot/v2/hook/xxxSecurity Best Practices
Always use HMAC secret in production:
swanlab_lark_webhook_url: https://open.feishu.cn/... swanlab_lark_secret: your-secret-key # ✅ Add this!Store secrets in environment variables (even better):
# In your training script/environment export SWANLAB_LARK_WEBHOOK_URL="https://open.feishu.cn/..." export SWANLAB_LARK_SECRET="your-secret-key"Then in config:
# SwanLab plugin will auto-detect environment variables use_swanlab: true swanlab_project: my-project # Lark URL and secret read from env varsRotate webhook secrets periodically: Update your Lark bot’s secret every 90 days
Use separate webhooks for dev/prod: Don’t mix development and production notifications
Distributed Training
Lark notifications are automatically deduplicated in distributed training: - Only rank 0 sends notifications - Other GPU ranks skip Lark registration - Prevents duplicate messages in multi-GPU training
torchrun --nproc_per_node=4 -m axolotl.cli.train config.ymlRLHF Completion Table Logging
For RLHF (Reinforcement Learning from Human Feedback) training methods like DPO, KTO, ORPO, and GRPO, SwanLab can log model completions (prompts, chosen/rejected responses, rewards) to a visual table for qualitative analysis. This helps you:
- Inspect model behavior: See actual model outputs during training
- Debug preference learning: Compare chosen vs rejected responses
- Track reward patterns: Monitor how rewards evolve over training
- Share examples with team: Visual tables in SwanLab dashboard
Features
- ✅ Automatic detection: Works with DPO, KTO, ORPO, GRPO trainers
- ✅ Memory-safe buffering: Bounded buffer prevents memory leaks in long training runs
- ✅ Periodic logging: Configurable logging interval to reduce overhead
- ✅ Rich visualization: SwanLab tables show prompts, responses, and metrics side-by-side
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
swanlab_log_completions |
bool | true |
Enable completion logging for RLHF trainers |
swanlab_completion_log_interval |
int | 100 |
Log completions to SwanLab every N training steps |
swanlab_completion_max_buffer |
int | 128 |
Maximum completions to buffer (memory bound) |
Example: DPO Training with Completion Logging
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: dpo-training
swanlab_experiment_name: llama-3-dpo-v1
swanlab_mode: cloud
swanlab_log_completions: true
swanlab_completion_log_interval: 100 # Log every 100 steps
swanlab_completion_max_buffer: 128 # Keep last 128 completions
rl: dpo
datasets:
- path: /path/to/preference_dataset
type: chatml.intelExample: Disable Completion Logging
If you’re doing a quick test run or don’t need completion tables:
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: dpo-training
swanlab_log_completions: falseSupported RLHF Trainers
The completion logging callback automatically activates for these trainer types:
- DPO (Direct Preference Optimization): Logs prompts, chosen, rejected, reward_diff
- KTO (Kahneman-Tversky Optimization): Logs prompts, completions, labels, rewards
- ORPO (Odds Ratio Preference Optimization): Logs prompts, chosen, rejected, log_odds_ratio
- GRPO (Group Relative Policy Optimization): Logs prompts, completions, rewards, advantages
- CPO (Constrained Policy Optimization): Logs prompts, chosen, rejected
For non-RLHF trainers (standard supervised fine-tuning), the completion callback is automatically skipped.
How It Works
- Auto-detection: Plugin detects trainer type at initialization
- Buffering: Completions are buffered in memory (up to
swanlab_completion_max_buffer) - Periodic logging: Every
swanlab_completion_log_intervalsteps, buffer is logged to SwanLab - Memory safety: Old completions are automatically dropped when buffer is full (uses
collections.deque) - Final flush: Remaining completions are logged when training completes
Viewing Completion Tables
After training starts, you can view completion tables in your SwanLab dashboard:
- Navigate to your experiment in SwanLab
- Look for the “rlhf_completions” table in the metrics panel
- The table shows:
- step: Training step when completion was generated
- prompt: Input prompt
- chosen: Preferred response (DPO/ORPO)
- rejected: Non-preferred response (DPO/ORPO)
- completion: Model output (KTO/GRPO)
- reward_diff/reward: Reward metrics
- Trainer-specific metrics (e.g., log_odds_ratio for ORPO)
Memory Management
The completion buffer is memory-bounded to prevent memory leaks:
from collections import deque
buffer = deque(maxlen=128) # Old completions automatically droppedMemory usage estimate: - Average completion: ~500 characters (prompt + responses) - Buffer size 128: ~64 KB (negligible) - Buffer size 1024: ~512 KB (still small)
Recommendation: Default buffer size (128) works well for most cases. Increase to 512-1024 only if you need to review more historical completions.
Performance Impact
Completion logging has minimal overhead:
- Buffering: O(1) append operation, negligible CPU/memory
- Logging: Only happens every N steps (default: 100)
- Network: SwanLab batches table uploads efficiently
Expected overhead: < 0.5% per training step
Troubleshooting
Completions not appearing in SwanLab
Cause: Trainer may not be logging completion data in the expected format.
Diagnostic steps:
1. Check trainer type detection in logs:
text INFO: SwanLab RLHF completion logging enabled for DPOTrainer (type: dpo)
2. Verify your trainer is an RLHF trainer (DPO/KTO/ORPO/GRPO)
3. Check if trainer logs completion data (this depends on TRL version)
Note: The current implementation expects trainers to log completion data in the logs dict during on_log() callback. Some TRL trainers may not expose this data by default. You may need to patch the trainer to expose completions.
Buffer fills up too quickly
Cause: High logging frequency with small buffer size.
Solution: Increase buffer size or logging interval:
swanlab_completion_log_interval: 200 # Log less frequently
swanlab_completion_max_buffer: 512 # Larger bufferMemory usage growing over time
Cause: Buffer should be bounded, so this indicates a bug.
Solution:
1. Verify swanlab_completion_max_buffer is set
2. Check SwanLab version is up to date
3. Report issue with memory profiling data
Performance Profiling
SwanLab integration includes profiling utilities to measure and log execution time of trainer methods. This helps you:
- Identify bottlenecks: Find slow operations in your training loop
- Optimize performance: Track improvements after optimization changes
- Monitor distributed training: See per-rank timing differences
- Debug hangs: Detect methods that take unexpectedly long
Features
- ✅ Zero-config profiling: Automatic timing of key trainer methods
- ✅ Decorator-based: Easy to add profiling to custom methods with
@swanlab_profile - ✅ Context manager: Fine-grained profiling with
swanlab_profiling_context() - ✅ Advanced filtering:
ProfilingConfigfor throttling and minimum duration thresholds - ✅ Exception-safe: Logs duration even if function raises an exception
Basic Usage: Decorator
Add profiling to any trainer method with the @swanlab_profile decorator:
from axolotl.integrations.swanlab.profiling import swanlab_profile
class MyCustomTrainer(AxolotlTrainer):
@swanlab_profile
def training_step(self, model, inputs):
# Your training step logic
return super().training_step(model, inputs)
@swanlab_profile
def prediction_step(self, model, inputs, prediction_loss_only):
# Your prediction logic
return super().prediction_step(model, inputs, prediction_loss_only)The decorator automatically:
1. Measures execution time with high-precision timer
2. Logs to SwanLab as profiling/Time taken: ClassName.method_name
3. Only logs if SwanLab is enabled (use_swanlab: true)
4. Gracefully handles exceptions (logs duration, then re-raises)
Advanced Usage: Context Manager
For fine-grained profiling within a method:
from axolotl.integrations.swanlab.profiling import swanlab_profiling_context
class MyTrainer(AxolotlTrainer):
def complex_training_step(self, model, inputs):
# Profile just the forward pass
with swanlab_profiling_context(self, "forward_pass"):
outputs = model(**inputs)
# Profile just the backward pass
with swanlab_profiling_context(self, "backward_pass"):
loss = outputs.loss
loss.backward()
return outputsAdvanced Usage: ProfilingConfig
Filter and throttle profiling logs with ProfilingConfig:
from axolotl.integrations.swanlab.profiling import (
swanlab_profiling_context_advanced,
ProfilingConfig,
)
profiling_config = ProfilingConfig(
enabled=True,
min_duration_ms=1.0, # Only log if duration > 1ms
log_interval=10, # Log every 10th call
)
class MyTrainer(AxolotlTrainer):
def frequently_called_method(self, data):
with swanlab_profiling_context_advanced(
self,
"frequent_op",
config=profiling_config
):
# This only logs every 10th call, and only if it takes > 1ms
result = expensive_computation(data)
return resultProfilingConfig Parameters:
- enabled: Enable/disable profiling globally (default: True)
- min_duration_ms: Minimum duration to log in milliseconds (default: 0.1)
- log_interval: Log every Nth function call (default: 1 = log all)
Use cases:
- High-frequency methods: Use log_interval=100 to reduce logging overhead
- Filter noise: Use min_duration_ms=1.0 to skip very fast operations
- Debugging: Use log_interval=1, min_duration_ms=0.0 to log everything
Viewing Profiling Metrics
In your SwanLab dashboard, profiling metrics appear under the “profiling” namespace:
profiling/Time taken: AxolotlTrainer.training_step
profiling/Time taken: AxolotlTrainer.prediction_step
profiling/Time taken: MyTrainer.forward_pass
profiling/Time taken: MyTrainer.backward_pass
You can: - Track over time: See if methods get faster/slower during training - Compare runs: Compare profiling metrics across experiments - Identify regressions: Detect if a code change slowed down training
Configuration in Axolotl Config
Profiling is automatically enabled when SwanLab is enabled. No additional config needed:
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: my-projectTo disable profiling while keeping SwanLab enabled:
from axolotl.integrations.swanlab.profiling import DEFAULT_PROFILING_CONFIG
DEFAULT_PROFILING_CONFIG.enabled = FalsePerformance Impact
- Decorator overhead: ~2-5 microseconds per call (negligible)
- Context manager overhead: ~1-3 microseconds (negligible)
- Logging overhead: Only when SwanLab is enabled and method duration exceeds threshold
- Network overhead: SwanLab batches metrics efficiently
Expected overhead: < 0.1% per training step (effectively zero)
Best Practices
- Profile bottlenecks first: Start by profiling suspected slow operations
- Use min_duration_ms: Filter out fast operations (< 1ms) to reduce noise
- Throttle high-frequency calls: Use
log_intervalfor methods called > 100 times/step - Profile across runs: Compare profiling metrics before/after optimization
- Monitor distributed training: Check for rank-specific slowdowns
Example: Complete Profiling Setup
from axolotl.integrations.swanlab.profiling import (
swanlab_profile,
swanlab_profiling_context,
ProfilingConfig,
)
class OptimizedTrainer(AxolotlTrainer):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Custom profiling config for high-frequency operations
self.fast_op_config = ProfilingConfig(
enabled=True,
min_duration_ms=0.5,
log_interval=50,
)
@swanlab_profile
def training_step(self, model, inputs):
"""Main training step - always profile."""
return super().training_step(model, inputs)
@swanlab_profile
def compute_loss(self, model, inputs, return_outputs=False):
"""Loss computation - always profile."""
return super().compute_loss(model, inputs, return_outputs)
def _prepare_inputs(self, inputs):
"""High-frequency operation - throttled profiling."""
with swanlab_profiling_context_advanced(
self,
"prepare_inputs",
config=self.fast_op_config,
):
return super()._prepare_inputs(inputs)Troubleshooting
Profiling metrics not appearing in SwanLab
Cause: SwanLab is not enabled or not initialized.
Solution:
use_swanlab: true
swanlab_project: my-projectCheck logs for:
INFO: SwanLab initialized for project: my-project
Too many profiling metrics cluttering dashboard
Cause: Profiling every function call for high-frequency operations.
Solution: Use ProfilingConfig with throttling:
config = ProfilingConfig(
min_duration_ms=1.0, # Skip fast ops
log_interval=100, # Log every 100th call
)Profiling overhead impacting training speed
Cause: Profiling itself should have negligible overhead (< 0.1%). If you see > 1% slowdown, this indicates a bug.
Solution:
1. Disable profiling temporarily to confirm:
python DEFAULT_PROFILING_CONFIG.enabled = False
2. Report issue with profiling data and trainer details
Profiling shows inconsistent timing
Cause: Normal variation due to GPU warmup, data loading, or system load.
Solution:
- Ignore first few steps (warmup period)
- Look at average/median timing over many steps
- Use log_interval to reduce noise from individual outliers
Complete Config Example
Here’s a complete example integrating SwanLab with your RVQ-Alpha training:
base_model: /path/to/your/model
model_type: Qwen2ForCausalLM
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
use_swanlab: true
swanlab_project: RVQ-Alpha-Training
swanlab_experiment_name: Qwen2.5-7B-MetaQA-Perturb-P020
swanlab_description: "Training on MetaQA and Perturbation datasets with NEW-RVQ encoding"
swanlab_mode: cloud
swanlab_workspace: single-cell-genomics
sequence_len: 32768
micro_batch_size: 1
gradient_accumulation_steps: 1
num_epochs: 2
learning_rate: 2e-5
optimizer: adamw_torch_fused
datasets:
- path: /path/to/dataset
type: chat_template
output_dir: ./outputsModes Explained
cloud Mode (Default)
- Syncs experiments to SwanLab cloud in real-time
- Requires API key and internet connection
- Best for: Team collaboration, remote monitoring
local Mode
- Saves experiments locally only
- No cloud sync
- Best for: Local development, air-gapped environments
offline Mode
- Saves metadata locally
- Can sync to cloud later using
swanlab sync - Best for: Unstable internet, sync later
disabled Mode
- Turns off SwanLab completely
- No logging or tracking
- Best for: Debugging, testing
Configuration Validation & Conflict Detection
SwanLab integration includes comprehensive validation and conflict detection to help you catch configuration errors early and avoid performance issues.
Required Fields Validation
The plugin validates your configuration at startup and provides clear error messages with solutions:
Missing Project Name
use_swanlab: trueSolution:
use_swanlab: true
swanlab_project: my-projectInvalid Mode
use_swanlab: true
swanlab_project: my-project
swanlab_mode: invalid-modeSolution:
use_swanlab: true
swanlab_project: my-project
swanlab_mode: cloud # or: local, offline, disabledEmpty Project Name
use_swanlab: true
swanlab_project: ""Solution:
use_swanlab: true
swanlab_project: my-projectCloud Mode API Key Warning
When using cloud mode without an API key, you’ll receive a warning with multiple solutions:
use_swanlab: true
swanlab_project: my-project
swanlab_mode: cloudSolutions:
1. Set environment variable: export SWANLAB_API_KEY=your-api-key
2. Add to config (less secure): swanlab_api_key: your-api-key
3. Run swanlab login before training
4. Use swanlab_mode: local for offline tracking
Multi-Logger Performance Warnings
Using multiple logging tools simultaneously (SwanLab + WandB + MLflow + Comet) can impact training performance:
Two Loggers - Warning
use_swanlab: true
swanlab_project: my-project
use_wandb: true
wandb_project: my-projectImpact: - Performance overhead: ~1-2% per logger (cumulative) - Increased memory usage - Longer training time per step - Potential config/callback conflicts
Recommendations: - Choose ONE primary logging tool for production training - Use multiple loggers only for: - Migration period (transitioning between tools) - Short comparison runs - Debugging specific tool issues - Monitor system resources (CPU, memory) during training
Three+ Loggers - Error-Level Warning
use_swanlab: true
swanlab_project: my-project
use_wandb: true
wandb_project: my-project
use_mlflow: true
mlflow_tracking_uri: http://localhost:5000Why This Matters: - With 3 loggers: ~4-5% overhead per step → significant slowdown over long training - Example: 10,000 steps at 2s/step → ~400-500 seconds extra (6-8 minutes) - Memory overhead scales with number of loggers - Rare edge cases with callback ordering conflicts
Auto-Enable Logic
For convenience, SwanLab will auto-enable if you specify a project without setting use_swanlab:
swanlab_project: my-project
use_swanlab: true
swanlab_project: my-projectDistributed Training Detection
In distributed training scenarios (multi-GPU), the plugin automatically detects and reports:
use_swanlab: true
swanlab_project: my-project
swanlab_mode: cloudWhy Only Rank 0: - Avoids duplicate experiment runs - Reduces network/cloud API overhead on worker ranks - Prevents race conditions in metric logging
Authentication
Method 1: Environment Variable (Recommended)
export SWANLAB_API_KEY=your-api-key-hereMethod 2: Login Command
swanlab loginMethod 3: Config File
swanlab_api_key: your-api-key-hereWhat Gets Logged?
Automatically Logged Metrics
- Training loss
- Learning rate
- Gradient norm
- Training steps
- Epoch progress
Automatically Logged Config
- Model configuration (base_model, model_type)
- Training hyperparameters (learning_rate, batch_size, etc.)
- Optimizer settings
- Parallelization settings (FSDP, DeepSpeed, Context Parallel)
- Axolotl configuration file
- DeepSpeed configuration (if used)
Viewing Your Experiments
Cloud Mode
Visit https://swanlab.cn and navigate to your project to view: - Real-time training metrics - Hyperparameter comparison - System resource usage - Configuration files
Local Mode
swanlab watch ./swanlogIntegration with Existing Tools
SwanLab can work alongside other tracking tools:
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin
use_swanlab: true
swanlab_project: my-project
use_wandb: true
wandb_project: my-projectTroubleshooting
Configuration Errors
Error: “SwanLab enabled but ‘swanlab_project’ is not set”
Cause: You enabled SwanLab (use_swanlab: true) but forgot to specify a project name.
Solution:
use_swanlab: true
swanlab_project: my-project # Add this lineError: “Invalid swanlab_mode: ‘xxx’”
Cause: You provided an invalid mode value.
Solution: Use one of the valid modes:
swanlab_mode: cloud # or: local, offline, disabledError: “swanlab_project cannot be an empty string”
Cause: You set swanlab_project: "" (empty string).
Solution: Either provide a valid name or remove the field:
swanlab_project: my-projectImport Errors
Error: “SwanLab is not installed”
Cause: SwanLab package is not installed in your environment.
Solution:
pip install swanlab
pip install swanlab>=0.3.0Performance Issues
Warning: “Multiple logging tools enabled”
Cause: You have multiple experiment tracking tools enabled (e.g., SwanLab + WandB + MLflow).
Impact: ~1-2% performance overhead per logger, cumulative.
Solution: For production training, disable all but one logger:
use_swanlab: true
swanlab_project: my-project
use_wandb: false # Disable others
use_mlflow: false
use_swanlab: false
use_wandb: true
wandb_project: my-projectException: Multiple loggers are acceptable for: - Short comparison runs (< 100 steps) - Migration testing between logging tools - Debugging logger-specific issues
Distributed Training Issues
SwanLab creates duplicate runs in multi-GPU training
Cause: All ranks are initializing SwanLab instead of just rank 0.
Expected Behavior: The plugin automatically ensures only rank 0 initializes SwanLab. You should see:
Info: Distributed training detected (world_size=4)
Info: Only rank 0 will initialize SwanLab
Info: Other ranks will skip SwanLab to avoid conflicts
If you see duplicates: 1. Check your plugin is loaded correctly 2. Verify you’re using the latest SwanLab integration code 3. Check logs for initialization messages on all ranks
SwanLab not logging metrics
Solution: Ensure SwanLab is initialized before training starts. The plugin automatically handles this in pre_model_load.
API Key errors
Solution:
echo $SWANLAB_API_KEY
swanlab loginCloud sync issues
Solution: Use offline mode and sync later:
swanlab_mode: offlineThen sync when ready:
swanlab sync ./swanlogPlugin not loaded
Solution: Verify plugin path in config:
plugins:
- axolotl.integrations.swanlab.SwanLabPlugin # Correct pathLark Notification Issues
Error: “Failed to import SwanLab Lark plugin”
Cause: Your SwanLab version doesn’t include the Lark plugin (requires SwanLab >= 0.3.0).
Solution:
pip install --upgrade swanlab
pip install 'swanlab>=0.3.0'Warning: “Lark webhook has no secret configured”
Cause: You provided swanlab_lark_webhook_url but no swanlab_lark_secret.
Impact: Lark notifications will work, but without HMAC authentication (security risk).
Solution: Add HMAC secret for production use:
swanlab_lark_webhook_url: https://open.feishu.cn/open-apis/bot/v2/hook/xxx
swanlab_lark_secret: your-webhook-secret # Add this lineWhen it’s OK to skip secret: - Local development and testing - Internal networks with restricted access - Non-sensitive training experiments
When secret is required: - Production training jobs - Training with proprietary data - Multi-team shared Lark groups
Error: “Failed to register Lark callback”
Cause: Invalid webhook URL or network connectivity issues.
Diagnostic steps:
curl -X POST "YOUR_WEBHOOK_URL" \
-H 'Content-Type: application/json' \
-d '{"msg_type":"text","content":{"text":"Test from Axolotl"}}'
pip show swanlabSolution: 1. Verify webhook URL is correct (copy from Lark bot settings) 2. Check network connectivity to Lark API 3. Ensure webhook is not expired (Lark webhooks can expire) 4. Regenerate webhook URL in Lark bot settings if needed
Lark notifications not received
Cause: Multiple possible causes.
Diagnostic checklist:
Check training logs for Lark registration confirmation:
# Expected log message (rank 0 only): INFO: Registered Lark notification callback with HMAC authenticationVerify webhook in Lark: Test webhook manually (see above)
Check distributed training: Only rank 0 sends notifications
# If running multi-GPU, check rank 0 logs specifically grep "Registered Lark" logs/rank_0.logVerify SwanLab is initialized: Lark callback needs SwanLab to be running
use_swanlab: true # Must be enabled swanlab_project: my-project # Must be setCheck Lark bot permissions: Ensure bot is added to the target group chat
Duplicate Lark notifications in multi-GPU training
Expected Behavior: Should NOT happen - only rank 0 sends notifications.
If you see duplicates:
1. Check that all GPUs are using the same config file
2. Verify plugin is loaded correctly on all ranks
3. Check logs for unexpected Lark initialization on non-zero ranks
4. Ensure RANK or LOCAL_RANK environment variables are set correctly
Solution: This is a bug if it occurs. Report with: - Full training command - Logs from all ranks - Config file
Comparison: SwanLab vs WandB
| Feature | SwanLab | WandB |
|---|---|---|
| Open Source | ✅ Yes | ❌ No |
| Self-Hosting | ✅ Easy | ⚠️ Complex |
| Free Tier | ✅ Generous | ⚠️ Limited |
| Chinese Support | ✅ Native | ⚠️ Limited |
| Offline Mode | ✅ Full support | ✅ Supported |
| Integration | 🆕 New | ✅ Mature |
Advanced Usage
Custom Logging
You can add custom metrics in your callbacks:
import swanlab
swanlab.log({
"custom_metric": value,
"epoch": epoch_num
})Experiment Comparison
swanlab compare run1 run2 run3Support
- Documentation: https://docs.swanlab.cn
- GitHub: https://github.com/SwanHubX/SwanLab
- Issues: Report bugs at GitHub Issues
License
This integration follows the Axolotl Community License Agreement.
Acknowledgements
This integration is built on top of: - SwanLab - Experiment tracking tool - Transformers - SwanLabCallback - Axolotl - Training framework
Please see reference here
Adding a new integration
Plugins can be used to customize the behavior of the training pipeline through hooks. See axolotl.integrations.BasePlugin for the possible hooks.
To add a new integration, please follow these steps:
- Create a new folder in the
src/axolotl/integrationsdirectory. - Add any relevant files (
LICENSE,README.md,ACKNOWLEDGEMENTS.md, etc.) to the new folder. - Add
__init__.pyandargs.pyfiles to the new folder.
__init__.pyshould import the integration and hook into the appropriate functions.args.pyshould define the arguments for the integration.
- (If applicable) Add CPU tests under
tests/integrationsor GPU tests undertests/e2e/integrations.
See src/axolotl/integrations/cut_cross_entropy for a minimal integration example.
If you could not load your integration, please ensure you are pip installing in editable mode.
pip install -e .and correctly spelled the integration name in the config file.
plugins:
- axolotl.integrations.your_integration_name.YourIntegrationPluginIt is not necessary to place your integration in the integrations folder. It can be in any location, so long as it’s installed in a package in your python env.
See this repo for an example: https://github.com/axolotl-ai-cloud/diff-transformer