utils.schemas.validation

utils.schemas.validation

Module with validation methods for config pydantic model.

Classes

Name Description
AttentionValidationMixin Validation methods related to attention mechanisms.
ChatTemplateValidationMixin Validation methods related to chat template configuration.
ComplexValidationMixin Complex validation methods that involve multiple systems.
DatasetValidationMixin Validation methods related to dataset configuration.
DistributedValidationMixin validation for distributed training.
EBFTValidationMixin Validation for EBFT (Energy-Based Fine-Tuning) configuration.
GRPOVllmValidationMixin Validation mixin for vllm when using GRPO.
LoRAValidationMixin Validation methods related to LoRA/QLoRA configuration.
ModelCompatibilityValidationMixin Validation methods for specific model compatibility.
OptimizationValidationMixin Validation methods related to optimization and performance.
PretrainingValidationMixin Validation methods related to pretraining configuration.
RLValidationMixin Validation methods related to RL training configuration.
SystemValidationMixin Validation methods related to system and hardware configuration.
TrainingValidationMixin Validation methods related to training configuration.
ValidationMixin Full validation mixin for Axolotl configuration.

AttentionValidationMixin

utils.schemas.validation.AttentionValidationMixin()

Validation methods related to attention mechanisms.

ChatTemplateValidationMixin

utils.schemas.validation.ChatTemplateValidationMixin()

Validation methods related to chat template configuration.

ComplexValidationMixin

utils.schemas.validation.ComplexValidationMixin()

Complex validation methods that involve multiple systems.

DatasetValidationMixin

utils.schemas.validation.DatasetValidationMixin()

Validation methods related to dataset configuration.

DistributedValidationMixin

utils.schemas.validation.DistributedValidationMixin()

validation for distributed training.

EBFTValidationMixin

utils.schemas.validation.EBFTValidationMixin()

Validation for EBFT (Energy-Based Fine-Tuning) configuration.

Methods

Name Description
check_ebft_activation_offloading activation_offloading replaces gradient checkpointing with FSDP-style wrapping,
check_ebft_config_required rl: ebft requires an ebft config section.
check_ebft_gradient_checkpointing_reentrant flex_attention + non-reentrant gradient checkpointing causes CheckpointError.
check_ebft_strided_dataset_split Warn about the common train_on_split mistake (silently ignored by schema).
check_ebft_strided_sequence_len Warn if sequence_len is too large for single-GPU strided EBFT.
check_ebft_torch_compile torch_compile + flex_attention + gradient_checkpointing causes dynamo recompiles
check_ebft_activation_offloading
utils.schemas.validation.EBFTValidationMixin.check_ebft_activation_offloading()

activation_offloading replaces gradient checkpointing with FSDP-style wrapping, which conflicts with flex_attention’s use_reentrant requirement.

check_ebft_config_required
utils.schemas.validation.EBFTValidationMixin.check_ebft_config_required(data)

rl: ebft requires an ebft config section.

check_ebft_gradient_checkpointing_reentrant
utils.schemas.validation.EBFTValidationMixin.check_ebft_gradient_checkpointing_reentrant(
)

flex_attention + non-reentrant gradient checkpointing causes CheckpointError.

check_ebft_strided_dataset_split
utils.schemas.validation.EBFTValidationMixin.check_ebft_strided_dataset_split(
    data,
)

Warn about the common train_on_split mistake (silently ignored by schema).

check_ebft_strided_sequence_len
utils.schemas.validation.EBFTValidationMixin.check_ebft_strided_sequence_len(
    data,
)

Warn if sequence_len is too large for single-GPU strided EBFT.

check_ebft_torch_compile
utils.schemas.validation.EBFTValidationMixin.check_ebft_torch_compile(data)

torch_compile + flex_attention + gradient_checkpointing causes dynamo recompiles and CheckpointErrors. The flex_attention kernel compiles itself internally — whole-model torch.compile is not needed and actively harmful.

GRPOVllmValidationMixin

utils.schemas.validation.GRPOVllmValidationMixin()

Validation mixin for vllm when using GRPO.

LoRAValidationMixin

utils.schemas.validation.LoRAValidationMixin()

Validation methods related to LoRA/QLoRA configuration.

ModelCompatibilityValidationMixin

utils.schemas.validation.ModelCompatibilityValidationMixin()

Validation methods for specific model compatibility.

OptimizationValidationMixin

utils.schemas.validation.OptimizationValidationMixin()

Validation methods related to optimization and performance.

Methods

Name Description
check_cross_entropy_conflicts Check for mutual exclusivity between cross entropy patch options.
check_cross_entropy_conflicts
utils.schemas.validation.OptimizationValidationMixin.check_cross_entropy_conflicts(
    data,
)

Check for mutual exclusivity between cross entropy patch options.

Only one of the following can be enabled at a time: - cut_cross_entropy (CutCrossEntropyPlugin) - chunked_cross_entropy - liger_cross_entropy (LigerPlugin) - liger_fused_linear_cross_entropy (LigerPlugin)

PretrainingValidationMixin

utils.schemas.validation.PretrainingValidationMixin()

Validation methods related to pretraining configuration.

RLValidationMixin

utils.schemas.validation.RLValidationMixin()

Validation methods related to RL training configuration.

Methods

Name Description
check_grpo_batch_size_divisibility Surface GRPO batch-shape mismatches at config-parse time.
check_grpo_batch_size_divisibility
utils.schemas.validation.RLValidationMixin.check_grpo_batch_size_divisibility(
    data,
)

Surface GRPO batch-shape mismatches at config-parse time.

TRL’s GRPOTrainer requires that the per-step generation batch size be evenly divisible by num_generations so that every prompt can be replicated exactly num_generations times. The runtime check inside GRPOTrainer.__init__ only fires after the model has been loaded — too late and too cryptic for the user. We replicate the check here so the failure is immediate and actionable.

Also enforces
  • num_generations >= 2 (group-relative advantage needs variance)
  • effective_gbs >= num_generations * world_size when capabilities indicate multiple ranks (each rank needs at least one full group)

SystemValidationMixin

utils.schemas.validation.SystemValidationMixin()

Validation methods related to system and hardware configuration.

TrainingValidationMixin

utils.schemas.validation.TrainingValidationMixin()

Validation methods related to training configuration.

ValidationMixin

utils.schemas.validation.ValidationMixin()

Full validation mixin for Axolotl configuration.