utils.schemas.validation
utils.schemas.validation
Module with validation methods for config pydantic model.
Classes
| Name | Description |
|---|---|
| AttentionValidationMixin | Validation methods related to attention mechanisms. |
| ChatTemplateValidationMixin | Validation methods related to chat template configuration. |
| ComplexValidationMixin | Complex validation methods that involve multiple systems. |
| DatasetValidationMixin | Validation methods related to dataset configuration. |
| DistributedValidationMixin | validation for distributed training. |
| EBFTValidationMixin | Validation for EBFT (Energy-Based Fine-Tuning) configuration. |
| GRPOVllmValidationMixin | Validation mixin for vllm when using GRPO. |
| LoRAValidationMixin | Validation methods related to LoRA/QLoRA configuration. |
| ModelCompatibilityValidationMixin | Validation methods for specific model compatibility. |
| OptimizationValidationMixin | Validation methods related to optimization and performance. |
| PretrainingValidationMixin | Validation methods related to pretraining configuration. |
| RLValidationMixin | Validation methods related to RL training configuration. |
| SystemValidationMixin | Validation methods related to system and hardware configuration. |
| TrainingValidationMixin | Validation methods related to training configuration. |
| ValidationMixin | Full validation mixin for Axolotl configuration. |
AttentionValidationMixin
utils.schemas.validation.AttentionValidationMixin()Validation methods related to attention mechanisms.
ChatTemplateValidationMixin
utils.schemas.validation.ChatTemplateValidationMixin()Validation methods related to chat template configuration.
ComplexValidationMixin
utils.schemas.validation.ComplexValidationMixin()Complex validation methods that involve multiple systems.
DatasetValidationMixin
utils.schemas.validation.DatasetValidationMixin()Validation methods related to dataset configuration.
DistributedValidationMixin
utils.schemas.validation.DistributedValidationMixin()validation for distributed training.
EBFTValidationMixin
utils.schemas.validation.EBFTValidationMixin()Validation for EBFT (Energy-Based Fine-Tuning) configuration.
Methods
| Name | Description |
|---|---|
| check_ebft_activation_offloading | activation_offloading replaces gradient checkpointing with FSDP-style wrapping, |
| check_ebft_config_required | rl: ebft requires an ebft config section. |
| check_ebft_gradient_checkpointing_reentrant | flex_attention + non-reentrant gradient checkpointing causes CheckpointError. |
| check_ebft_strided_dataset_split | Warn about the common train_on_split mistake (silently ignored by schema). |
| check_ebft_strided_sequence_len | Warn if sequence_len is too large for single-GPU strided EBFT. |
| check_ebft_torch_compile | torch_compile + flex_attention + gradient_checkpointing causes dynamo recompiles |
check_ebft_activation_offloading
utils.schemas.validation.EBFTValidationMixin.check_ebft_activation_offloading()activation_offloading replaces gradient checkpointing with FSDP-style wrapping, which conflicts with flex_attention’s use_reentrant requirement.
check_ebft_config_required
utils.schemas.validation.EBFTValidationMixin.check_ebft_config_required(data)rl: ebft requires an ebft config section.
check_ebft_gradient_checkpointing_reentrant
utils.schemas.validation.EBFTValidationMixin.check_ebft_gradient_checkpointing_reentrant(
)flex_attention + non-reentrant gradient checkpointing causes CheckpointError.
check_ebft_strided_dataset_split
utils.schemas.validation.EBFTValidationMixin.check_ebft_strided_dataset_split(
data,
)Warn about the common train_on_split mistake (silently ignored by schema).
check_ebft_strided_sequence_len
utils.schemas.validation.EBFTValidationMixin.check_ebft_strided_sequence_len(
data,
)Warn if sequence_len is too large for single-GPU strided EBFT.
check_ebft_torch_compile
utils.schemas.validation.EBFTValidationMixin.check_ebft_torch_compile(data)torch_compile + flex_attention + gradient_checkpointing causes dynamo recompiles and CheckpointErrors. The flex_attention kernel compiles itself internally — whole-model torch.compile is not needed and actively harmful.
GRPOVllmValidationMixin
utils.schemas.validation.GRPOVllmValidationMixin()Validation mixin for vllm when using GRPO.
LoRAValidationMixin
utils.schemas.validation.LoRAValidationMixin()Validation methods related to LoRA/QLoRA configuration.
ModelCompatibilityValidationMixin
utils.schemas.validation.ModelCompatibilityValidationMixin()Validation methods for specific model compatibility.
OptimizationValidationMixin
utils.schemas.validation.OptimizationValidationMixin()Validation methods related to optimization and performance.
Methods
| Name | Description |
|---|---|
| check_cross_entropy_conflicts | Check for mutual exclusivity between cross entropy patch options. |
check_cross_entropy_conflicts
utils.schemas.validation.OptimizationValidationMixin.check_cross_entropy_conflicts(
data,
)Check for mutual exclusivity between cross entropy patch options.
Only one of the following can be enabled at a time: - cut_cross_entropy (CutCrossEntropyPlugin) - chunked_cross_entropy - liger_cross_entropy (LigerPlugin) - liger_fused_linear_cross_entropy (LigerPlugin)
PretrainingValidationMixin
utils.schemas.validation.PretrainingValidationMixin()Validation methods related to pretraining configuration.
RLValidationMixin
utils.schemas.validation.RLValidationMixin()Validation methods related to RL training configuration.
Methods
| Name | Description |
|---|---|
| check_grpo_batch_size_divisibility | Surface GRPO batch-shape mismatches at config-parse time. |
check_grpo_batch_size_divisibility
utils.schemas.validation.RLValidationMixin.check_grpo_batch_size_divisibility(
data,
)Surface GRPO batch-shape mismatches at config-parse time.
TRL’s GRPOTrainer requires that the per-step generation batch size be
evenly divisible by num_generations so that every prompt can be
replicated exactly num_generations times. The runtime check inside
GRPOTrainer.__init__ only fires after the model has been loaded —
too late and too cryptic for the user. We replicate the check here so
the failure is immediate and actionable.
Also enforces
num_generations >= 2(group-relative advantage needs variance)effective_gbs >= num_generations * world_sizewhen capabilities indicate multiple ranks (each rank needs at least one full group)
SystemValidationMixin
utils.schemas.validation.SystemValidationMixin()Validation methods related to system and hardware configuration.
TrainingValidationMixin
utils.schemas.validation.TrainingValidationMixin()Validation methods related to training configuration.
ValidationMixin
utils.schemas.validation.ValidationMixin()Full validation mixin for Axolotl configuration.