API Reference

Core

Core functionality for training

train Prepare and train a model on a dataset. Can also infer from a model or merge lora
evaluate Module for evaluating models.
datasets Module containing dataset functionality.
convert Module containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes
prompt_tokenizers Module containing PromptTokenizingStrategy and Prompter classes
prompters Module containing prompters
processing_strategies Module containing ProcessingStrategy classes and its derivative for different MultiModal Model types
logging_config Common logging module for axolotl.
core.builders.base Base class for trainer builder
core.builders.causal Builder for causal trainers
core.builders.rl Builder for RLHF trainers
core.training_args extra axolotl specific training args
core.training_args_base Base Axolotl Training Mixins shared across various trainer configs
core.chat.messages internal message representations of chat messages
core.chat.format.chatml ChatML transformation functions for MessageContents
core.chat.format.llama3x Llama 3.x chat formatting functions for MessageContents
core.chat.format.shared shared functions for format transforms
core.datasets.chat chat dataset module
core.datasets.transforms.chat_builder This module contains a function that builds a transform that takes a row from the

CLI

Command-line interface

cli.main Click CLI definitions for various axolotl commands.
cli.train CLI to run training on a model.
cli.evaluate CLI to run evaluation on a model.
cli.args Module for axolotl CLI command arguments.
cli.art Axolotl ASCII logo utils.
cli.checks Various checks for Axolotl CLI.
cli.config Configuration loading and processing.
cli.delinearize_llama4 CLI tool to delinearize quantized/Linearized Llama-4 models.
cli.inference CLI to run inference on a trained model.
cli.merge_lora CLI to merge a trained LoRA into a base model.
cli.merge_sharded_fsdp_weights CLI to merge sharded FSDP model checkpoints into a single combined checkpoint.
cli.preprocess CLI to run preprocessing of a dataset.
cli.quantize CLI to post-training quantize a model using torchao
cli.vllm_serve CLI to start the vllm server for online RL
cli.agent_docs Bundled agent documentation for axolotl.
cli.cloud launch axolotl in supported cloud platforms
cli.cloud.base base class for cloud platforms from cli
cli.cloud.baseten Baseten Cloud CLI
cli.cloud.modal_ Modal Cloud support from CLI
cli.utils Init for axolotl.cli.utils module.
cli.utils.args Utilities for axolotl CLI args.
cli.utils.diffusion Helpers for diffusion-mode inference in CLI and Gradio.
cli.utils.fetch Utilities for axolotl fetch CLI command.
cli.utils.load Utilities for model, tokenizer, etc. loading.
cli.utils.lora_merge
cli.utils.sweeps Utilities for handling sweeps over configs for axolotl train CLI command
cli.utils.train Utilities for axolotl train CLI command.

Trainers

Training implementations

core.trainers.base Module for customized trainers
core.trainers.constants
core.trainers.trl Module for TRL RL trainers
core.trainers.mamba Module for mamba trainer
core.trainers.dpo.args Axolotl specific DPO args
core.trainers.dpo.trainer DPO trainer for axolotl
core.trainers.ebft EBFT (Energy-Based Fine-Tuning) Strategy for training
core.trainers.ebft.args EBFT-specific training arguments.
core.trainers.ebft.kernels Fused Triton kernels for strided EBFT.
core.trainers.ebft.rewards Feature-matching reward utilities for Energy-Based Fine-Tuning (EBFT).
core.trainers.ebft.strided Strided block-parallel EBFT trainer for unstructured text data.
core.trainers.ebft.trainer EBFT Trainer — Energy-Based Fine-Tuning integrated via GRPOTrainer.
core.trainers.grpo GRPO Specific Strategy for training
core.trainers.grpo.args Axolotl Specific Training Args
core.trainers.grpo.trainer Axolotl GRPO trainers (with and without sequence parallelism handling)
core.trainers.grpo.async_trainer Async GRPO training with streaming scoring and IS correction.
core.trainers.grpo.fast_async_trainer Experimental GRPO extensions: parallel reward workers, replay buffer,
core.trainers.grpo.replay_buffer Simple replay buffer for storing and sampling high-signal rollout groups.
core.trainers.grpo.sampler Repeat random sampler (similar to the one implemented in
core.trainers.utils Utils for Axolotl trainers

Model Loading

Functionality for loading and patching models, tokenizers, etc.

loaders.model Model loader class implementation for loading, configuring, and patching various models.
loaders.tokenizer Tokenizer loading functionality and associated utils
loaders.processor Processor loading functionality for multi-modal models
loaders.adapter Adapter loading functionality, including LoRA / QLoRA and associated utils
loaders.patch_manager Patch manager class implementation to complement axolotl.loaders.ModelLoader.
loaders.constants Shared constants for axolotl.loaders module
loaders.utils Utilities for axolotl.loaders module

Mixins

Mixin classes for augmenting trainers

core.trainers.mixins.activation_checkpointing Trainer mixin for activation checkpointing w offloading
core.trainers.mixins.checkpoints Custom handling to not fail training if fsdp optimizer is not savable
core.trainers.mixins.distributed_parallel Mixin for correctly saving fsdp
core.trainers.mixins.layer_offloading Trainer mixin for layer-wise parameter offloading to CPU.
core.trainers.mixins.optimizer Module for Axolotl trainer optimizer mixin
core.trainers.mixins.packing Trainer mixin to support packing
core.trainers.mixins.rng_state_loader Temporary fix/override for bug in resume from checkpoint
core.trainers.mixins.scheduler Module for Axolotl trainer scheduler mixin

Context Managers

Context managers for altering trainer behaviors

utils.ctx_managers.sequence_parallel Module for Axolotl trainer sequence parallelism manager and utilities

Prompt Strategies

Prompt formatting strategies

prompt_strategies.base module for base dataset transform strategies
prompt_strategies.chat_template HF Chat Templates prompt strategy
prompt_strategies.alpaca_chat Module for Alpaca prompt strategy classes
prompt_strategies.alpaca_instruct Module loading the AlpacaInstructPromptTokenizingStrategy class
prompt_strategies.alpaca_w_system Prompt strategies loader for alpaca instruction datasets with system prompts
prompt_strategies.user_defined User Defined prompts with configuration from the YML config
prompt_strategies.llama2_chat Prompt Strategy for finetuning Llama2 chat models
prompt_strategies.completion Basic completion text
prompt_strategies.context_qa Module containing the classes for Context QA Prompt Tokenization Strategies
prompt_strategies.creative_acr Module loading the CreativePromptTokenizingStrategy and similar classes
prompt_strategies.input_output Module for plain input/output prompt pairs
prompt_strategies.pretrain pretraining prompt strategies
prompt_strategies.stepwise_supervised Module for stepwise datasets, typically including a prompt and reasoning traces,
prompt_strategies.metharme Module containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class
prompt_strategies.orcamini Prompt Strategy for finetuning Orca Mini (v2) models
prompt_strategies.pygmalion Module containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class
prompt_strategies.messages.chat Chat dataset wrapping strategy for new internal messages representations
prompt_strategies.ebft.ebft_chat_multiturn Dataset transform for multi-turn chat data with structured EBFT (vLLM mode).
prompt_strategies.ebft.ebft_opencode Dataset transform for nvidia/OpenCodeInstruct with EBFT structured mode.
prompt_strategies.ebft.ebft_reasoning Dataset transform for reasoning/thinking datasets with EBFT.
prompt_strategies.ebft.ebft_strided_chat Dataset transform for multi-turn chat data with strided EBFT.
prompt_strategies.ebft.ebft_strided_structured Dataset transform for structured (prompt, completion) data with strided EBFT.
prompt_strategies.dpo.chat_template DPO prompt strategies for using tokenizer chat templates.
prompt_strategies.dpo.llama3 DPO strategies for llama-3 chat template
prompt_strategies.dpo.chatml DPO strategies for chatml
prompt_strategies.dpo.zephyr DPO strategies for zephyr
prompt_strategies.dpo.user_defined User-defined DPO strategies
prompt_strategies.dpo.passthrough DPO prompt strategies passthrough/zero-processing strategy
prompt_strategies.kto.llama3 KTO strategies for llama-3 chat template
prompt_strategies.kto.chatml KTO strategies for chatml
prompt_strategies.kto.user_defined User-defined KTO strategies
prompt_strategies.orpo.chat_template chatml prompt tokenization strategy for ORPO
prompt_strategies.bradley_terry.chat_template Bradley-Terry model with chat template prompt strategy.
prompt_strategies.bradley_terry.llama3 chatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template

Kernels

Low-level performance optimizations

kernels.lora Module for definition of Low-Rank Adaptation (LoRA) Triton kernels.
kernels.dora Triton kernels for DoRA (Weight-Decomposed Low-Rank Adaptation).
kernels.geglu Module for definition of GEGLU Triton kernels.
kernels.swiglu Module for definition of SwiGLU Triton kernels.
kernels.quantize Dequantization utilities for bitsandbytes and FP8 integration.
kernels.autotune_telemetry Telemetry for the fused RMSNorm+RoPE Triton autotune selections.
kernels.gemma4_fused_rope Fused RMSNorm + (partial) RoPE Triton kernel for Gemma 4 / Qwen3 Q/K paths.
kernels.rms_norm_gated Fused RMSNorm + SiLU Gate Triton kernel.
kernels.utils Utilities for axolotl.kernels submodules.

Monkey Patches

Runtime patches for model optimizations

monkeypatch.llama_attn_hijack_flash Flash attention monkey patch for llama model
monkeypatch.llama_attn_hijack_xformers Directly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments
monkeypatch.mistral_attn_hijack_flash Flash attention monkey patch for mistral model
monkeypatch.multipack multipack patching for v2 of sample packing
monkeypatch.relora Implements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune.
monkeypatch.lora_kernels Module for patching custom LoRA Triton kernels and torch.autograd functions.
monkeypatch.utils Shared utils for the monkeypatches
monkeypatch.btlm_attn_hijack_flash Flash attention monkey patch for cerebras btlm model
monkeypatch.stablelm_attn_hijack_flash PyTorch StableLM Epoch model.
monkeypatch.transformers_fa_utils see https://github.com/huggingface/transformers/pull/35834
monkeypatch.data.batch_dataset_fetcher Monkey patches for the dataset fetcher to handle batches of packed indexes.
monkeypatch.mixtral Patches to support multipack for mixtral
monkeypatch.gradient_checkpointing.offload_cpu CPU offloaded checkpointing
monkeypatch.gradient_checkpointing.offload_disk DISCO - DIsk-based Storage and Checkpointing with Optimized prefetching
monkeypatch.deepspeed_utils
monkeypatch.fsdp2_qlora Monkeypatch to add Params4bit and Int8Params support to FSDP2. This enables QLoRA + FSDP2
monkeypatch.gemma4_hybrid_mask Hybrid attention mask fix for Gemma 4 (standard and unified).
monkeypatch.gemma4_loss_kwargs Flip accepts_loss_kwargs to True on Gemma 4 (Unified) ForConditionalGeneration.
monkeypatch.kernelize_fixes Repairs for transformers’ model.kernelize() under use_kernels=True.
monkeypatch.moe_quant Loading-time quantization for MoE expert weights stored as 3D nn.Parameter tensors.
monkeypatch.scaled_softmax_attn Scaled Softmax (SSMax) attention patch using FlexAttention.
monkeypatch.torchao_optim Patch for torchao optim subclasses that crash under torch.compile.
monkeypatch.trainer_accelerator_args allow adding additional kwargs to Accelerator init
monkeypatch.accelerate.fsdp2 monkeypatch for accelerate fsdp2 fix when modifying ordereddict during interation, and saving full state dicts
monkeypatch.accelerate.parallelism_config ParallelismConfig monkeypatch.
monkeypatch.attention.flash_attn_4 Transparently upgrade FA2 to FA4 when available on SM90+ hardware.
monkeypatch.attention.flex_attn Flex attention monkey patch
monkeypatch.attention.fp8_attn FP8 low-precision attention via torchao.
monkeypatch.attention.sage_attn Monkeypatch for SageAttention for use with transformers.
monkeypatch.attention.xformers xformers attention implementation for packing
monkeypatch.loss.chunked chunked ce loss
monkeypatch.loss.eaft eaft (entropy-aware focal training) loss implementation
monkeypatch.models.apertus.activation Monkeypatch for Apertus to dtype mismatch in XIELU act
monkeypatch.models.falcon_h1.modeling Sample-packing and context-parallelism patch for Falcon-H1 (parallel Mamba2/Attention hybrid).
monkeypatch.models.gemma4_unified.fused_attn Gemma 4 Unified fused attention monkeypatch.
monkeypatch.models.granitemoehybrid.modeling Sample-packing and context-parallelism patch for Granite MoE Hybrid (Mamba2/Attention/MoE).
monkeypatch.models.kimi_linear.patch_kimi_linear
monkeypatch.models.llama4.modeling Modified Llama-4 text experts modeling for linearized experts for improved LoRA support
monkeypatch.models.mamba_utils Shared utilities for Mamba2 SSM sample-packing and context-parallelism patches.
monkeypatch.models.mistral3.mistral_common_tokenizer Monkeypatch to fix inefficient tensor conversion in MistralCommonBackend.apply_chat_template
monkeypatch.models.nemotron_h.modeling Sample-packing and context-parallelism patch for NemotronH (Mamba2/Attention/MoE hybrid).
monkeypatch.models.pixtral.modeling_flash_attention_utils Monkeypatch for FA utils to accept 1D position_ids from Pixtral’s position_ids_in_meshgrid
monkeypatch.models.qwen3.fused_attn Fuse q_norm/k_norm + RoPE in Qwen3Attention.forward via one Triton kernel.
monkeypatch.models.qwen3_5.fused_attn Fused q_norm/k_norm + RoPE for Qwen3.5 (gated q_proj, unit_offset=True RMSNorm).
monkeypatch.models.qwen3_5.modeling Monkeypatch for Qwen3_5 and Qwen3_5Moe models to pass position_ids to linear attention.
monkeypatch.models.qwen3_5_moe.fused_attn Qwen3.5-MoE variant of the qwen3_5 fused-attention monkeypatch.
monkeypatch.models.qwen3_moe.fused_attn Qwen3-MoE variant of the qwen3 fused-attention monkeypatch.
monkeypatch.models.qwen3_next.modeling Monkeypatch for Qwen3_Next model to pass position_ids to linear attention.
monkeypatch.models.qwen3_vl.fused_attn
monkeypatch.models.voxtral.modeling Monkeypatch for voxtral to fix leaf node and dtype mismatch
monkeypatch.peft.utils Patch prepare_model_for_kbit_training to not upcast everything
monkeypatch.ring_attn.adapters.batch HuggingFace flash attention adapter for basic ring attention (batch API).
monkeypatch.ring_attn.patch Ring attention group registration and flash attention patching.
monkeypatch.tiled_mlp.base TiledMLP support for DDP, FSDP, and single GPU
monkeypatch.tiled_mlp.patch Monkeypatch for Tiled MLP implementation
monkeypatch.trainer.lr monkeypatch for Trainer _get_learning_rate method
monkeypatch.trainer.trl Monkeypatch for TRL trainer FSDP preparation.
monkeypatch.trainer.trl_vllm Monkeypatches for TRL’s vLLM integration and trainer utils.
monkeypatch.trainer.utils
monkeypatch.transformers.trainer_loss_calc Module for patching transformers Trainer loss calculation to use nanmean.
monkeypatch.xformers_ Fused MLP layer for incrementally improved training efficiency

Utils

Utility functions

utils.tokenization Module for tokenization utilities
utils.chat_templates This module provides functionality for selecting chat templates based on user choices.
utils.chat_templates.base utility functions for chat templates
utils.lora module to get the state dict of a merged lora model
utils.model_shard_quant module to handle loading model on cpu/meta device for FSDP
utils.bench Benchmarking and measurement utilities
utils.comet_ Module for wandb utilities
utils.config Module for working with config dicts
utils.cuda13 Helpers for CUDA 13 uv images.
utils.datasets helper functions for datasets
utils.environment utils to get GPU info for the current environment
utils.fp32_norms Helpers for keeping selected norm modules in fp32 under FSDP2.
utils.freeze module to freeze/unfreeze parameters by name
utils.import_helper Helper for importing modules from strings
utils.logging Logging helpers to only log on main process.
utils.mlflow_ Module for mlflow utilities
utils.tee Utilities for managing the debug log file and providing a file-only stream for logging
utils.trackio_ Module for trackio utilities
utils.train Training utils for checkpoints
utils.trainer Module containing the Trainer class and related functions
utils.wandb_ Module for wandb utilities
utils.weight_serde Serialize / deserialize tensors for HTTP and IPC weight sync.
utils.schedulers Module for custom LRScheduler class
utils.distributed Utilities for distributed functionality.
utils.dict Module containing the DictDefault class
utils.generation.sft Sample generation utilities for SFT/Pretrain training.
utils.mistral.mistral3_processor Processor for Mistral3 multimodal models with image support
utils.mistral.mistral_tokenizer Wrapper for MistralTokenizer from mistral-common
utils.optimizers.adopt Copied from https://github.com/iShohei220/adopt
utils.optimizers.qgalore Helpers for the Q-GaLore optimizer integration.
utils.data.streaming Data handling specific to streaming datasets.
utils.data.sft Data handling specific to SFT.
utils.data.rl Data handling specific to RL trainers.
utils.data.lock Logic for loading / preparing a dataset once over all processes.
utils.data.utils Data handling helpers
utils.data.wrappers Data handling specific to SFT.
utils.quantization Utilities for quantization including QAT and PTQ using torchao.

Schemas

Pydantic data models for Axolotl config

utils.schemas.config Module with Pydantic models for configuration.
utils.schemas.model Pydantic models for model input / output, etc. configuration
utils.schemas.training Pydantic models for training hyperparameters
utils.schemas.datasets Pydantic models for datasets-related configuration
utils.schemas.peft Pydantic models for PEFT-related configuration
utils.schemas.trl Pydantic models for TRL trainer configuration
utils.schemas.multimodal Pydantic models for multimodal-related configuration
utils.schemas.integrations Pydantic models for Axolotl integrations
utils.schemas.deprecated Pydantic models for deprecated and remapped configuration parameters
utils.schemas.dynamic_checkpoint Schema for dynamic checkpoint configuration.
utils.schemas.fsdp FSDP Configuration Schema
utils.schemas.quantization QAT Config Schema
utils.schemas.validation Module with validation methods for config pydantic model.
utils.schemas.vllm Pydantic models for VLLM configuration, used primarily for RL training with TRL + grpo
utils.schemas.enums Enums for Axolotl input config
utils.schemas.utils Utilities for Axolotl Pydantic models

Integrations

Third-party integrations and extensions

integrations.base Base class for all plugins.
integrations.config Module to handle merging the plugins’ input arguments with the base configurations.
integrations.cut_cross_entropy Module for the Plugin for Cut Cross Entropy integration with Axolotl.
integrations.cut_cross_entropy.args Module for handling Cut Cross Entropy input arguments.
integrations.densemixer.args Pydantic models for DenseMixer plugin
integrations.densemixer.plugin DenseMixer plugin for Axolotl
integrations.diffusion.args Config args for diffusion LM training (nested under diffusion:).
integrations.diffusion.callbacks Callbacks for diffusion training.
integrations.diffusion.generation Sample generation utilities for diffusion training.
integrations.diffusion.plugin Diffusion LM training plugin for Axolotl.
integrations.diffusion.trainer Custom trainer for diffusion LM training.
integrations.diffusion.utils Shared utilities for diffusion integration.
integrations.expert_parallel.args Pydantic args for the Expert-Parallel (DeepEP) plugin.
integrations.expert_parallel.buffer DeepEP Buffer singleton, lazily constructed on first call.
integrations.expert_parallel.experts_fn DeepEP-backed registered functions for ALL_EXPERTS_FUNCTIONS.
integrations.expert_parallel.plugin Expert-Parallel (DeepEP) plugin for axolotl.
integrations.expert_parallel.shard Generic expert-weight sharding for @use_experts_implementation modules.
integrations.grokfast.args config args for grokfast plugin
integrations.grokfast.optimizer
integrations.hatchery.args Pydantic config schema for the Hatchery integration.
integrations.hatchery.data Convert axolotl batch tensors to Tinker/Hatchery Datum format.
integrations.hatchery.plugin Axolotl plugin that routes training to a remote Hatchery/Tinker API.
integrations.hatchery.rewards.math_reward Math reward function for hendrycks_math GRPO training.
integrations.hatchery.rl_trainer Remote RL trainer (GRPO/PPO) using Tinker or Hatchery API.
integrations.hatchery.trainer Remote trainer that dispatches to Tinker or Hatchery API.
integrations.kd Plugin init to add KD support to Axolotl.
integrations.kd.args Plugin args for KD support.
integrations.kd.callbacks Transformers trainer callbacks to schedule the KD temperature during training
integrations.kd.chat_template Chat template prompt strategy loader with KD support
integrations.kd.collator DataCollator for axolotl to handle KD fields without using -inf for padding,
integrations.kd.collator_online_teacher Packed data loader for online teacher training supporting vllm and sglang.
integrations.kd.kernels.liger Liger Kernels for Chunked Top-K Log-Prob Distillation
integrations.kd.topk_logprob.forward_kl loss for top_k KL divergence
integrations.kd.trainer KD trainer
integrations.kd.utils Helper KD utils
integrations.kernels.args
integrations.kernels.autotune_callback Trainer callback for reporting Triton autotune results from scattermoe-lora kernels.
integrations.kernels.autotune_collector Collect Triton autotune results from scattermoe-lora kernels.
integrations.kernels.constants Diagnostic helpers for MoE kernel integrations (kernel dispatch itself
integrations.kernels.plugin
integrations.liger.args Module for handling LIGER input arguments.
integrations.liger.plugin Liger-Kernel Plugin for Axolotl
integrations.liger.utils utils to patch liger kernel ops to disable torch.compile
integrations.liger.models.base Generic FLCE patch for untested models similar to Llama
integrations.liger.models.deepseekv2 DeepseekV2 model with LigerFusedLinearCrossEntropyLoss
integrations.liger.models.jamba Jamba model with LigerFusedLinearCrossEntropyLoss
integrations.liger.models.llama4 Liger FLCE for llama4
integrations.liger.models.qwen3 Liger FLCE for Qwen3. Based on transformers v4.51.3.
integrations.liger.models.qwen3_5 Liger FLCE for Qwen3.5. Based on transformers v5.3.0.
integrations.liger.models.qwen3_5_moe Liger FLCE for Qwen3.5 MoE. Based on transformers v5.3.0.
integrations.liger.models.qwen3_moe Liger FLCE for Qwen3 MoE. Based on transformers v4.51.3.
integrations.llm_compressor.args LLMCompressor and Sparse Finetuning config models.
integrations.llm_compressor.plugin Sparse Finetuning plugin for Axolotl — enables handling of sparse neural networks
integrations.llm_compressor.utils Utilities for llmcompressor integration with axolotl.
integrations.lm_eval.args Module for handling lm eval harness input arguments.
integrations.lm_eval.cli axolotl CLI for running lm_eval tasks
integrations.mora.args Config args for MoRA / ReMoRA.
integrations.mora.plugin MoRA / ReMoRA plugin for Axolotl.
integrations.nemo_gym.args Input arguments for the NeMo Gym integration plugin.
integrations.nemo_gym.data_producer NeMo Gym Data Producer for async GRPO training.
integrations.nemo_gym.dataset Dataset loading for NeMo Gym JSONL files.
integrations.nemo_gym.multi_turn Multi-turn rollout function for NeMo Gym environments.
integrations.nemo_gym.plugin NeMo Gym Plugin for Axolotl.
integrations.nemo_gym.rewards NeMo Gym reward functions.
integrations.nemo_gym.server NeMo Gym server lifecycle management.
integrations.spectrum Spectrum Plugin to automatically generate unfrozen parameters based on SNR data.
integrations.spectrum.args Module for handling Spectrum input arguments.
integrations.swanlab.args SwanLab configuration arguments
integrations.swanlab.callbacks SwanLab callbacks for Axolotl trainers.
integrations.swanlab.completion_logger SwanLab completion logger for RLHF/DPO/KTO/ORPO/GRPO training.
integrations.swanlab.plugins SwanLab Plugin for Axolotl

Common

Common utilities and shared functionality

common.architectures Common architecture specific constants
common.const Various shared constants
common.datasets Dataset loading utilities.

Models

Custom model implementations

models.mamba.configuration_mamba HF Transformers MambaConfig
models.mamba.modeling_mamba

Data Processing

Data processing utilities

utils.collators.core basic shared collator constants
utils.collators.batching Data collators for axolotl to pad labels and position_ids for packed sequences
utils.collators.dpo DPO/ORPO/IPO/KTO data collator with pad_to_multiple_of support.
utils.collators.mamba collators for Mamba
utils.collators.mm_chat Collators for multi-modal chat messages and packing
utils.samplers.multipack Multipack Batch Sampler - An efficient batch sampler for packing variable-length sequences
utils.samplers.utils helper util to calculate dataset lengths

Callbacks

Training callbacks

utils.callbacks Callbacks for Trainer class
utils.callbacks.perplexity callback to calculate perplexity as an evaluation metric.
utils.callbacks.profiler HF Trainer callback for creating pytorch profiling snapshots
utils.callbacks.lisa module for LISA
utils.callbacks.mlflow_ MLFlow module for trainer callbacks
utils.callbacks.comet_ Comet module for trainer callbacks
utils.callbacks.qat QAT Callback for HF Causal Trainer
utils.callbacks.dynamic_checkpoint
utils.callbacks.generation Callback for generating samples during SFT/Pretrain training.
utils.callbacks.models Helper functions for model classes
utils.callbacks.opentelemetry OpenTelemetry metrics callback for Axolotl training
utils.callbacks.swanlab Callbacks for SwanLab integration
utils.callbacks.tokens_per_second A callback for calculating tokens per second during training.
utils.callbacks.trackio_ Trackio module for trainer callbacks

Scripts

Standalone helper scripts

scripts.process_cleanup Reusable process lifecycle management for vLLM serve scripts.
scripts.vllm_serve_lora vLLM serve script with native LoRA adapter support.
scripts.vllm_worker_ext Extended vLLM worker extension with batch weight sync support.

Telemetry

Usage telemetry

telemetry.callbacks Trainer callbacks for reporting runtime metrics at regular intervals.
telemetry.errors Telemetry utilities for exception and traceback information.
telemetry.manager Telemetry manager and associated utilities.
telemetry.runtime_metrics Telemetry utilities for runtime and memory metrics.