API Reference
Core
Core functionality for training
| train | Prepare and train a model on a dataset. Can also infer from a model or merge lora |
| evaluate | Module for evaluating models. |
| datasets | Module containing dataset functionality. |
| convert | Module containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes |
| prompt_tokenizers | Module containing PromptTokenizingStrategy and Prompter classes |
| prompters | Module containing prompters |
| processing_strategies | Module containing ProcessingStrategy classes and its derivative for different MultiModal Model types |
| logging_config | Common logging module for axolotl. |
| core.builders.base | Base class for trainer builder |
| core.builders.causal | Builder for causal trainers |
| core.builders.rl | Builder for RLHF trainers |
| core.training_args | extra axolotl specific training args |
| core.training_args_base | Base Axolotl Training Mixins shared across various trainer configs |
| core.chat.messages | internal message representations of chat messages |
| core.chat.format.chatml | ChatML transformation functions for MessageContents |
| core.chat.format.llama3x | Llama 3.x chat formatting functions for MessageContents |
| core.chat.format.shared | shared functions for format transforms |
| core.datasets.chat | chat dataset module |
| core.datasets.transforms.chat_builder | This module contains a function that builds a transform that takes a row from the |
CLI
Command-line interface
| cli.main | Click CLI definitions for various axolotl commands. |
| cli.train | CLI to run training on a model. |
| cli.evaluate | CLI to run evaluation on a model. |
| cli.args | Module for axolotl CLI command arguments. |
| cli.art | Axolotl ASCII logo utils. |
| cli.checks | Various checks for Axolotl CLI. |
| cli.config | Configuration loading and processing. |
| cli.delinearize_llama4 | CLI tool to delinearize quantized/Linearized Llama-4 models. |
| cli.inference | CLI to run inference on a trained model. |
| cli.merge_lora | CLI to merge a trained LoRA into a base model. |
| cli.merge_sharded_fsdp_weights | CLI to merge sharded FSDP model checkpoints into a single combined checkpoint. |
| cli.preprocess | CLI to run preprocessing of a dataset. |
| cli.quantize | CLI to post-training quantize a model using torchao |
| cli.vllm_serve | CLI to start the vllm server for online RL |
| cli.agent_docs | Bundled agent documentation for axolotl. |
| cli.cloud | launch axolotl in supported cloud platforms |
| cli.cloud.base | base class for cloud platforms from cli |
| cli.cloud.baseten | Baseten Cloud CLI |
| cli.cloud.modal_ | Modal Cloud support from CLI |
| cli.utils | Init for axolotl.cli.utils module. |
| cli.utils.args | Utilities for axolotl CLI args. |
| cli.utils.diffusion | Helpers for diffusion-mode inference in CLI and Gradio. |
| cli.utils.fetch | Utilities for axolotl fetch CLI command. |
| cli.utils.load | Utilities for model, tokenizer, etc. loading. |
| cli.utils.lora_merge | |
| cli.utils.sweeps | Utilities for handling sweeps over configs for axolotl train CLI command |
| cli.utils.train | Utilities for axolotl train CLI command. |
Trainers
Training implementations
| core.trainers.base | Module for customized trainers |
| core.trainers.constants | |
| core.trainers.trl | Module for TRL RL trainers |
| core.trainers.mamba | Module for mamba trainer |
| core.trainers.dpo.args | Axolotl specific DPO args |
| core.trainers.dpo.trainer | DPO trainer for axolotl |
| core.trainers.ebft | EBFT (Energy-Based Fine-Tuning) Strategy for training |
| core.trainers.ebft.args | EBFT-specific training arguments. |
| core.trainers.ebft.kernels | Fused Triton kernels for strided EBFT. |
| core.trainers.ebft.rewards | Feature-matching reward utilities for Energy-Based Fine-Tuning (EBFT). |
| core.trainers.ebft.strided | Strided block-parallel EBFT trainer for unstructured text data. |
| core.trainers.ebft.trainer | EBFT Trainer — Energy-Based Fine-Tuning integrated via GRPOTrainer. |
| core.trainers.grpo | GRPO Specific Strategy for training |
| core.trainers.grpo.args | Axolotl Specific Training Args |
| core.trainers.grpo.trainer | Axolotl GRPO trainers (with and without sequence parallelism handling) |
| core.trainers.grpo.async_trainer | Async GRPO training with streaming scoring and IS correction. |
| core.trainers.grpo.fast_async_trainer | Experimental GRPO extensions: parallel reward workers, replay buffer, |
| core.trainers.grpo.replay_buffer | Simple replay buffer for storing and sampling high-signal rollout groups. |
| core.trainers.grpo.sampler | Repeat random sampler (similar to the one implemented in |
| core.trainers.utils | Utils for Axolotl trainers |
Model Loading
Functionality for loading and patching models, tokenizers, etc.
| loaders.model | Model loader class implementation for loading, configuring, and patching various models. |
| loaders.tokenizer | Tokenizer loading functionality and associated utils |
| loaders.processor | Processor loading functionality for multi-modal models |
| loaders.adapter | Adapter loading functionality, including LoRA / QLoRA and associated utils |
| loaders.patch_manager | Patch manager class implementation to complement axolotl.loaders.ModelLoader. |
| loaders.constants | Shared constants for axolotl.loaders module |
| loaders.utils | Utilities for axolotl.loaders module |
Mixins
Mixin classes for augmenting trainers
| core.trainers.mixins.activation_checkpointing | Trainer mixin for activation checkpointing w offloading |
| core.trainers.mixins.checkpoints | Custom handling to not fail training if fsdp optimizer is not savable |
| core.trainers.mixins.distributed_parallel | Mixin for correctly saving fsdp |
| core.trainers.mixins.layer_offloading | Trainer mixin for layer-wise parameter offloading to CPU. |
| core.trainers.mixins.optimizer | Module for Axolotl trainer optimizer mixin |
| core.trainers.mixins.packing | Trainer mixin to support packing |
| core.trainers.mixins.rng_state_loader | Temporary fix/override for bug in resume from checkpoint |
| core.trainers.mixins.scheduler | Module for Axolotl trainer scheduler mixin |
Context Managers
Context managers for altering trainer behaviors
| utils.ctx_managers.sequence_parallel | Module for Axolotl trainer sequence parallelism manager and utilities |
Prompt Strategies
Prompt formatting strategies
| prompt_strategies.base | module for base dataset transform strategies |
| prompt_strategies.chat_template | HF Chat Templates prompt strategy |
| prompt_strategies.alpaca_chat | Module for Alpaca prompt strategy classes |
| prompt_strategies.alpaca_instruct | Module loading the AlpacaInstructPromptTokenizingStrategy class |
| prompt_strategies.alpaca_w_system | Prompt strategies loader for alpaca instruction datasets with system prompts |
| prompt_strategies.user_defined | User Defined prompts with configuration from the YML config |
| prompt_strategies.llama2_chat | Prompt Strategy for finetuning Llama2 chat models |
| prompt_strategies.completion | Basic completion text |
| prompt_strategies.context_qa | Module containing the classes for Context QA Prompt Tokenization Strategies |
| prompt_strategies.creative_acr | Module loading the CreativePromptTokenizingStrategy and similar classes |
| prompt_strategies.input_output | Module for plain input/output prompt pairs |
| prompt_strategies.pretrain | pretraining prompt strategies |
| prompt_strategies.stepwise_supervised | Module for stepwise datasets, typically including a prompt and reasoning traces, |
| prompt_strategies.metharme | Module containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class |
| prompt_strategies.orcamini | Prompt Strategy for finetuning Orca Mini (v2) models |
| prompt_strategies.pygmalion | Module containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class |
| prompt_strategies.messages.chat | Chat dataset wrapping strategy for new internal messages representations |
| prompt_strategies.ebft.ebft_chat_multiturn | Dataset transform for multi-turn chat data with structured EBFT (vLLM mode). |
| prompt_strategies.ebft.ebft_opencode | Dataset transform for nvidia/OpenCodeInstruct with EBFT structured mode. |
| prompt_strategies.ebft.ebft_reasoning | Dataset transform for reasoning/thinking datasets with EBFT. |
| prompt_strategies.ebft.ebft_strided_chat | Dataset transform for multi-turn chat data with strided EBFT. |
| prompt_strategies.ebft.ebft_strided_structured | Dataset transform for structured (prompt, completion) data with strided EBFT. |
| prompt_strategies.dpo.chat_template | DPO prompt strategies for using tokenizer chat templates. |
| prompt_strategies.dpo.llama3 | DPO strategies for llama-3 chat template |
| prompt_strategies.dpo.chatml | DPO strategies for chatml |
| prompt_strategies.dpo.zephyr | DPO strategies for zephyr |
| prompt_strategies.dpo.user_defined | User-defined DPO strategies |
| prompt_strategies.dpo.passthrough | DPO prompt strategies passthrough/zero-processing strategy |
| prompt_strategies.kto.llama3 | KTO strategies for llama-3 chat template |
| prompt_strategies.kto.chatml | KTO strategies for chatml |
| prompt_strategies.kto.user_defined | User-defined KTO strategies |
| prompt_strategies.orpo.chat_template | chatml prompt tokenization strategy for ORPO |
| prompt_strategies.bradley_terry.chat_template | Bradley-Terry model with chat template prompt strategy. |
| prompt_strategies.bradley_terry.llama3 | chatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template |
Kernels
Low-level performance optimizations
| kernels.lora | Module for definition of Low-Rank Adaptation (LoRA) Triton kernels. |
| kernels.dora | Triton kernels for DoRA (Weight-Decomposed Low-Rank Adaptation). |
| kernels.geglu | Module for definition of GEGLU Triton kernels. |
| kernels.swiglu | Module for definition of SwiGLU Triton kernels. |
| kernels.quantize | Dequantization utilities for bitsandbytes and FP8 integration. |
| kernels.autotune_telemetry | Telemetry for the fused RMSNorm+RoPE Triton autotune selections. |
| kernels.gemma4_fused_rope | Fused RMSNorm + (partial) RoPE Triton kernel for Gemma 4 / Qwen3 Q/K paths. |
| kernels.rms_norm_gated | Fused RMSNorm + SiLU Gate Triton kernel. |
| kernels.utils | Utilities for axolotl.kernels submodules. |
Monkey Patches
Runtime patches for model optimizations
| monkeypatch.llama_attn_hijack_flash | Flash attention monkey patch for llama model |
| monkeypatch.llama_attn_hijack_xformers | Directly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments |
| monkeypatch.mistral_attn_hijack_flash | Flash attention monkey patch for mistral model |
| monkeypatch.multipack | multipack patching for v2 of sample packing |
| monkeypatch.relora | Implements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune. |
| monkeypatch.lora_kernels | Module for patching custom LoRA Triton kernels and torch.autograd functions. |
| monkeypatch.utils | Shared utils for the monkeypatches |
| monkeypatch.btlm_attn_hijack_flash | Flash attention monkey patch for cerebras btlm model |
| monkeypatch.stablelm_attn_hijack_flash | PyTorch StableLM Epoch model. |
| monkeypatch.transformers_fa_utils | see https://github.com/huggingface/transformers/pull/35834 |
| monkeypatch.data.batch_dataset_fetcher | Monkey patches for the dataset fetcher to handle batches of packed indexes. |
| monkeypatch.mixtral | Patches to support multipack for mixtral |
| monkeypatch.gradient_checkpointing.offload_cpu | CPU offloaded checkpointing |
| monkeypatch.gradient_checkpointing.offload_disk | DISCO - DIsk-based Storage and Checkpointing with Optimized prefetching |
| monkeypatch.deepspeed_utils | |
| monkeypatch.fsdp2_qlora | Monkeypatch to add Params4bit and Int8Params support to FSDP2. This enables QLoRA + FSDP2 |
| monkeypatch.gemma4_hybrid_mask | Hybrid attention mask fix for Gemma 4 (standard and unified). |
| monkeypatch.gemma4_loss_kwargs | Flip accepts_loss_kwargs to True on Gemma 4 (Unified) ForConditionalGeneration. |
| monkeypatch.kernelize_fixes | Repairs for transformers’ model.kernelize() under use_kernels=True. |
| monkeypatch.moe_quant | Loading-time quantization for MoE expert weights stored as 3D nn.Parameter tensors. |
| monkeypatch.scaled_softmax_attn | Scaled Softmax (SSMax) attention patch using FlexAttention. |
| monkeypatch.torchao_optim | Patch for torchao optim subclasses that crash under torch.compile. |
| monkeypatch.trainer_accelerator_args | allow adding additional kwargs to Accelerator init |
| monkeypatch.accelerate.fsdp2 | monkeypatch for accelerate fsdp2 fix when modifying ordereddict during interation, and saving full state dicts |
| monkeypatch.accelerate.parallelism_config | ParallelismConfig monkeypatch. |
| monkeypatch.attention.flash_attn_4 | Transparently upgrade FA2 to FA4 when available on SM90+ hardware. |
| monkeypatch.attention.flex_attn | Flex attention monkey patch |
| monkeypatch.attention.fp8_attn | FP8 low-precision attention via torchao. |
| monkeypatch.attention.sage_attn | Monkeypatch for SageAttention for use with transformers. |
| monkeypatch.attention.xformers | xformers attention implementation for packing |
| monkeypatch.loss.chunked | chunked ce loss |
| monkeypatch.loss.eaft | eaft (entropy-aware focal training) loss implementation |
| monkeypatch.models.apertus.activation | Monkeypatch for Apertus to dtype mismatch in XIELU act |
| monkeypatch.models.falcon_h1.modeling | Sample-packing and context-parallelism patch for Falcon-H1 (parallel Mamba2/Attention hybrid). |
| monkeypatch.models.gemma4_unified.fused_attn | Gemma 4 Unified fused attention monkeypatch. |
| monkeypatch.models.granitemoehybrid.modeling | Sample-packing and context-parallelism patch for Granite MoE Hybrid (Mamba2/Attention/MoE). |
| monkeypatch.models.kimi_linear.patch_kimi_linear | |
| monkeypatch.models.llama4.modeling | Modified Llama-4 text experts modeling for linearized experts for improved LoRA support |
| monkeypatch.models.mamba_utils | Shared utilities for Mamba2 SSM sample-packing and context-parallelism patches. |
| monkeypatch.models.mistral3.mistral_common_tokenizer | Monkeypatch to fix inefficient tensor conversion in MistralCommonBackend.apply_chat_template |
| monkeypatch.models.nemotron_h.modeling | Sample-packing and context-parallelism patch for NemotronH (Mamba2/Attention/MoE hybrid). |
| monkeypatch.models.pixtral.modeling_flash_attention_utils | Monkeypatch for FA utils to accept 1D position_ids from Pixtral’s position_ids_in_meshgrid |
| monkeypatch.models.qwen3.fused_attn | Fuse q_norm/k_norm + RoPE in Qwen3Attention.forward via one Triton kernel. |
| monkeypatch.models.qwen3_5.fused_attn | Fused q_norm/k_norm + RoPE for Qwen3.5 (gated q_proj, unit_offset=True RMSNorm). |
| monkeypatch.models.qwen3_5.modeling | Monkeypatch for Qwen3_5 and Qwen3_5Moe models to pass position_ids to linear attention. |
| monkeypatch.models.qwen3_5_moe.fused_attn | Qwen3.5-MoE variant of the qwen3_5 fused-attention monkeypatch. |
| monkeypatch.models.qwen3_moe.fused_attn | Qwen3-MoE variant of the qwen3 fused-attention monkeypatch. |
| monkeypatch.models.qwen3_next.modeling | Monkeypatch for Qwen3_Next model to pass position_ids to linear attention. |
| monkeypatch.models.qwen3_vl.fused_attn | |
| monkeypatch.models.voxtral.modeling | Monkeypatch for voxtral to fix leaf node and dtype mismatch |
| monkeypatch.peft.utils | Patch prepare_model_for_kbit_training to not upcast everything |
| monkeypatch.ring_attn.adapters.batch | HuggingFace flash attention adapter for basic ring attention (batch API). |
| monkeypatch.ring_attn.patch | Ring attention group registration and flash attention patching. |
| monkeypatch.tiled_mlp.base | TiledMLP support for DDP, FSDP, and single GPU |
| monkeypatch.tiled_mlp.patch | Monkeypatch for Tiled MLP implementation |
| monkeypatch.trainer.lr | monkeypatch for Trainer _get_learning_rate method |
| monkeypatch.trainer.trl | Monkeypatch for TRL trainer FSDP preparation. |
| monkeypatch.trainer.trl_vllm | Monkeypatches for TRL’s vLLM integration and trainer utils. |
| monkeypatch.trainer.utils | |
| monkeypatch.transformers.trainer_loss_calc | Module for patching transformers Trainer loss calculation to use nanmean. |
| monkeypatch.xformers_ | Fused MLP layer for incrementally improved training efficiency |
Utils
Utility functions
| utils.tokenization | Module for tokenization utilities |
| utils.chat_templates | This module provides functionality for selecting chat templates based on user choices. |
| utils.chat_templates.base | utility functions for chat templates |
| utils.lora | module to get the state dict of a merged lora model |
| utils.model_shard_quant | module to handle loading model on cpu/meta device for FSDP |
| utils.bench | Benchmarking and measurement utilities |
| utils.comet_ | Module for wandb utilities |
| utils.config | Module for working with config dicts |
| utils.cuda13 | Helpers for CUDA 13 uv images. |
| utils.datasets | helper functions for datasets |
| utils.environment | utils to get GPU info for the current environment |
| utils.fp32_norms | Helpers for keeping selected norm modules in fp32 under FSDP2. |
| utils.freeze | module to freeze/unfreeze parameters by name |
| utils.import_helper | Helper for importing modules from strings |
| utils.logging | Logging helpers to only log on main process. |
| utils.mlflow_ | Module for mlflow utilities |
| utils.tee | Utilities for managing the debug log file and providing a file-only stream for logging |
| utils.trackio_ | Module for trackio utilities |
| utils.train | Training utils for checkpoints |
| utils.trainer | Module containing the Trainer class and related functions |
| utils.wandb_ | Module for wandb utilities |
| utils.weight_serde | Serialize / deserialize tensors for HTTP and IPC weight sync. |
| utils.schedulers | Module for custom LRScheduler class |
| utils.distributed | Utilities for distributed functionality. |
| utils.dict | Module containing the DictDefault class |
| utils.generation.sft | Sample generation utilities for SFT/Pretrain training. |
| utils.mistral.mistral3_processor | Processor for Mistral3 multimodal models with image support |
| utils.mistral.mistral_tokenizer | Wrapper for MistralTokenizer from mistral-common |
| utils.optimizers.adopt | Copied from https://github.com/iShohei220/adopt |
| utils.optimizers.qgalore | Helpers for the Q-GaLore optimizer integration. |
| utils.data.streaming | Data handling specific to streaming datasets. |
| utils.data.sft | Data handling specific to SFT. |
| utils.data.rl | Data handling specific to RL trainers. |
| utils.data.lock | Logic for loading / preparing a dataset once over all processes. |
| utils.data.utils | Data handling helpers |
| utils.data.wrappers | Data handling specific to SFT. |
| utils.quantization | Utilities for quantization including QAT and PTQ using torchao. |
Schemas
Pydantic data models for Axolotl config
| utils.schemas.config | Module with Pydantic models for configuration. |
| utils.schemas.model | Pydantic models for model input / output, etc. configuration |
| utils.schemas.training | Pydantic models for training hyperparameters |
| utils.schemas.datasets | Pydantic models for datasets-related configuration |
| utils.schemas.peft | Pydantic models for PEFT-related configuration |
| utils.schemas.trl | Pydantic models for TRL trainer configuration |
| utils.schemas.multimodal | Pydantic models for multimodal-related configuration |
| utils.schemas.integrations | Pydantic models for Axolotl integrations |
| utils.schemas.deprecated | Pydantic models for deprecated and remapped configuration parameters |
| utils.schemas.dynamic_checkpoint | Schema for dynamic checkpoint configuration. |
| utils.schemas.fsdp | FSDP Configuration Schema |
| utils.schemas.quantization | QAT Config Schema |
| utils.schemas.validation | Module with validation methods for config pydantic model. |
| utils.schemas.vllm | Pydantic models for VLLM configuration, used primarily for RL training with TRL + grpo |
| utils.schemas.enums | Enums for Axolotl input config |
| utils.schemas.utils | Utilities for Axolotl Pydantic models |
Integrations
Third-party integrations and extensions
| integrations.base | Base class for all plugins. |
| integrations.config | Module to handle merging the plugins’ input arguments with the base configurations. |
| integrations.cut_cross_entropy | Module for the Plugin for Cut Cross Entropy integration with Axolotl. |
| integrations.cut_cross_entropy.args | Module for handling Cut Cross Entropy input arguments. |
| integrations.densemixer.args | Pydantic models for DenseMixer plugin |
| integrations.densemixer.plugin | DenseMixer plugin for Axolotl |
| integrations.diffusion.args | Config args for diffusion LM training (nested under diffusion:). |
| integrations.diffusion.callbacks | Callbacks for diffusion training. |
| integrations.diffusion.generation | Sample generation utilities for diffusion training. |
| integrations.diffusion.plugin | Diffusion LM training plugin for Axolotl. |
| integrations.diffusion.trainer | Custom trainer for diffusion LM training. |
| integrations.diffusion.utils | Shared utilities for diffusion integration. |
| integrations.expert_parallel.args | Pydantic args for the Expert-Parallel (DeepEP) plugin. |
| integrations.expert_parallel.buffer | DeepEP Buffer singleton, lazily constructed on first call. |
| integrations.expert_parallel.experts_fn | DeepEP-backed registered functions for ALL_EXPERTS_FUNCTIONS. |
| integrations.expert_parallel.plugin | Expert-Parallel (DeepEP) plugin for axolotl. |
| integrations.expert_parallel.shard | Generic expert-weight sharding for @use_experts_implementation modules. |
| integrations.grokfast.args | config args for grokfast plugin |
| integrations.grokfast.optimizer | |
| integrations.hatchery.args | Pydantic config schema for the Hatchery integration. |
| integrations.hatchery.data | Convert axolotl batch tensors to Tinker/Hatchery Datum format. |
| integrations.hatchery.plugin | Axolotl plugin that routes training to a remote Hatchery/Tinker API. |
| integrations.hatchery.rewards.math_reward | Math reward function for hendrycks_math GRPO training. |
| integrations.hatchery.rl_trainer | Remote RL trainer (GRPO/PPO) using Tinker or Hatchery API. |
| integrations.hatchery.trainer | Remote trainer that dispatches to Tinker or Hatchery API. |
| integrations.kd | Plugin init to add KD support to Axolotl. |
| integrations.kd.args | Plugin args for KD support. |
| integrations.kd.callbacks | Transformers trainer callbacks to schedule the KD temperature during training |
| integrations.kd.chat_template | Chat template prompt strategy loader with KD support |
| integrations.kd.collator | DataCollator for axolotl to handle KD fields without using -inf for padding, |
| integrations.kd.collator_online_teacher | Packed data loader for online teacher training supporting vllm and sglang. |
| integrations.kd.kernels.liger | Liger Kernels for Chunked Top-K Log-Prob Distillation |
| integrations.kd.topk_logprob.forward_kl | loss for top_k KL divergence |
| integrations.kd.trainer | KD trainer |
| integrations.kd.utils | Helper KD utils |
| integrations.kernels.args | |
| integrations.kernels.autotune_callback | Trainer callback for reporting Triton autotune results from scattermoe-lora kernels. |
| integrations.kernels.autotune_collector | Collect Triton autotune results from scattermoe-lora kernels. |
| integrations.kernels.constants | Diagnostic helpers for MoE kernel integrations (kernel dispatch itself |
| integrations.kernels.plugin | |
| integrations.liger.args | Module for handling LIGER input arguments. |
| integrations.liger.plugin | Liger-Kernel Plugin for Axolotl |
| integrations.liger.utils | utils to patch liger kernel ops to disable torch.compile |
| integrations.liger.models.base | Generic FLCE patch for untested models similar to Llama |
| integrations.liger.models.deepseekv2 | DeepseekV2 model with LigerFusedLinearCrossEntropyLoss |
| integrations.liger.models.jamba | Jamba model with LigerFusedLinearCrossEntropyLoss |
| integrations.liger.models.llama4 | Liger FLCE for llama4 |
| integrations.liger.models.qwen3 | Liger FLCE for Qwen3. Based on transformers v4.51.3. |
| integrations.liger.models.qwen3_5 | Liger FLCE for Qwen3.5. Based on transformers v5.3.0. |
| integrations.liger.models.qwen3_5_moe | Liger FLCE for Qwen3.5 MoE. Based on transformers v5.3.0. |
| integrations.liger.models.qwen3_moe | Liger FLCE for Qwen3 MoE. Based on transformers v4.51.3. |
| integrations.llm_compressor.args | LLMCompressor and Sparse Finetuning config models. |
| integrations.llm_compressor.plugin | Sparse Finetuning plugin for Axolotl — enables handling of sparse neural networks |
| integrations.llm_compressor.utils | Utilities for llmcompressor integration with axolotl. |
| integrations.lm_eval.args | Module for handling lm eval harness input arguments. |
| integrations.lm_eval.cli | axolotl CLI for running lm_eval tasks |
| integrations.mora.args | Config args for MoRA / ReMoRA. |
| integrations.mora.plugin | MoRA / ReMoRA plugin for Axolotl. |
| integrations.nemo_gym.args | Input arguments for the NeMo Gym integration plugin. |
| integrations.nemo_gym.data_producer | NeMo Gym Data Producer for async GRPO training. |
| integrations.nemo_gym.dataset | Dataset loading for NeMo Gym JSONL files. |
| integrations.nemo_gym.multi_turn | Multi-turn rollout function for NeMo Gym environments. |
| integrations.nemo_gym.plugin | NeMo Gym Plugin for Axolotl. |
| integrations.nemo_gym.rewards | NeMo Gym reward functions. |
| integrations.nemo_gym.server | NeMo Gym server lifecycle management. |
| integrations.spectrum | Spectrum Plugin to automatically generate unfrozen parameters based on SNR data. |
| integrations.spectrum.args | Module for handling Spectrum input arguments. |
| integrations.swanlab.args | SwanLab configuration arguments |
| integrations.swanlab.callbacks | SwanLab callbacks for Axolotl trainers. |
| integrations.swanlab.completion_logger | SwanLab completion logger for RLHF/DPO/KTO/ORPO/GRPO training. |
| integrations.swanlab.plugins | SwanLab Plugin for Axolotl |
Common
Common utilities and shared functionality
| common.architectures | Common architecture specific constants |
| common.const | Various shared constants |
| common.datasets | Dataset loading utilities. |
Models
Custom model implementations
| models.mamba.configuration_mamba | HF Transformers MambaConfig |
| models.mamba.modeling_mamba |
Data Processing
Data processing utilities
| utils.collators.core | basic shared collator constants |
| utils.collators.batching | Data collators for axolotl to pad labels and position_ids for packed sequences |
| utils.collators.dpo | DPO/ORPO/IPO/KTO data collator with pad_to_multiple_of support. |
| utils.collators.mamba | collators for Mamba |
| utils.collators.mm_chat | Collators for multi-modal chat messages and packing |
| utils.samplers.multipack | Multipack Batch Sampler - An efficient batch sampler for packing variable-length sequences |
| utils.samplers.utils | helper util to calculate dataset lengths |
Callbacks
Training callbacks
| utils.callbacks | Callbacks for Trainer class |
| utils.callbacks.perplexity | callback to calculate perplexity as an evaluation metric. |
| utils.callbacks.profiler | HF Trainer callback for creating pytorch profiling snapshots |
| utils.callbacks.lisa | module for LISA |
| utils.callbacks.mlflow_ | MLFlow module for trainer callbacks |
| utils.callbacks.comet_ | Comet module for trainer callbacks |
| utils.callbacks.qat | QAT Callback for HF Causal Trainer |
| utils.callbacks.dynamic_checkpoint | |
| utils.callbacks.generation | Callback for generating samples during SFT/Pretrain training. |
| utils.callbacks.models | Helper functions for model classes |
| utils.callbacks.opentelemetry | OpenTelemetry metrics callback for Axolotl training |
| utils.callbacks.swanlab | Callbacks for SwanLab integration |
| utils.callbacks.tokens_per_second | A callback for calculating tokens per second during training. |
| utils.callbacks.trackio_ | Trackio module for trainer callbacks |
Scripts
Standalone helper scripts
| scripts.process_cleanup | Reusable process lifecycle management for vLLM serve scripts. |
| scripts.vllm_serve_lora | vLLM serve script with native LoRA adapter support. |
| scripts.vllm_worker_ext | Extended vLLM worker extension with batch weight sync support. |
Telemetry
Usage telemetry
| telemetry.callbacks | Trainer callbacks for reporting runtime metrics at regular intervals. |
| telemetry.errors | Telemetry utilities for exception and traceback information. |
| telemetry.manager | Telemetry manager and associated utilities. |
| telemetry.runtime_metrics | Telemetry utilities for runtime and memory metrics. |