core.trainers.ebft.args

core.trainers.ebft.args

EBFT-specific training arguments.

Two config classes: - AxolotlEBFTConfig: extends GRPOConfig for structured QA data (uses vLLM generation) - AxolotlStridedEBFTConfig: extends TrainingArguments for unstructured text (strided generation)

Classes

Name Description
AxolotlAsyncEBFTConfig EBFT config for async structured QA data — extends FastAsyncGRPOConfig.
AxolotlEBFTConfig EBFT config for structured QA data — extends GRPOConfig.
AxolotlStridedEBFTConfig EBFT config for unstructured text with strided block-parallel generation.
EBFTFieldsMixin Common fields shared between structured and strided EBFT configs.

AxolotlAsyncEBFTConfig

core.trainers.ebft.args.AxolotlAsyncEBFTConfig(
    use_data_producer=False,
    async_prefetch=False,
    prefetch_depth=1,
    vllm_sync_interval=1,
    batch_flattening=False,
    streaming_partial_batch=False,
    streaming_min_groups=1,
    vllm_importance_sampling_correction=True,
    vllm_importance_sampling_mode='token_truncate',
    vllm_importance_sampling_cap=3.0,
    off_policy_mask_threshold=None,
    use_bias_correction_kl=False,
    reward_num_workers=1,
    replay_buffer_size=0,
    replay_recompute_logps=True,
    reroll_start_fraction=0.5,
    reroll_max_groups=1,
    skip_zero_advantage_batches=True,
    vllm_lora_sync=False,
    ebft_feature_layers=(lambda: [0.25, 0.5, 0.75])(),
    ebft_embed_method='last_token',
    ebft_use_whitening=False,
    ebft_alignment_coef=1.0,
    ebft_diversity_coef=1.0,
    ebft_ce_coef=0.0,
    ebft_adaptive_max_tokens=True,
    ebft_gt_length_multiplier=1.5,
)

EBFT config for async structured QA data — extends FastAsyncGRPOConfig.

Includes all async fields: async_prefetch, vllm_lora_sync, skip_zero_advantage_batches, streaming_partial_batch, replay_buffer_size, etc.

AxolotlEBFTConfig

core.trainers.ebft.args.AxolotlEBFTConfig(
    ebft_feature_layers=(lambda: [0.25, 0.5, 0.75])(),
    ebft_embed_method='last_token',
    ebft_use_whitening=False,
    ebft_alignment_coef=1.0,
    ebft_diversity_coef=1.0,
    ebft_ce_coef=0.0,
    ebft_adaptive_max_tokens=True,
    ebft_gt_length_multiplier=1.5,
    vllm_lora_sync=False,
)

EBFT config for structured QA data — extends GRPOConfig.

AxolotlStridedEBFTConfig

core.trainers.ebft.args.AxolotlStridedEBFTConfig(
    ebft_feature_layers=(lambda: [0.25, 0.5, 0.75])(),
    ebft_embed_method='last_token',
    ebft_use_whitening=False,
    ebft_alignment_coef=1.0,
    ebft_diversity_coef=1.0,
    ebft_ce_coef=0.0,
    ebft_adaptive_max_tokens=True,
    ebft_gt_length_multiplier=1.5,
    ebft_stride=8,
    ebft_context_length=8,
    ebft_generate_max_len=8,
    ebft_n_samples_per_prompt=4,
    ebft_temperature=0.6,
    ebft_top_p=1.0,
    ebft_rl_coef=1.0,
    ebft_advantage_estimator='rloo',
    ebft_min_completion_prefix=0,
)

EBFT config for unstructured text with strided block-parallel generation.

EBFTFieldsMixin

core.trainers.ebft.args.EBFTFieldsMixin(
    ebft_feature_layers=(lambda: [0.25, 0.5, 0.75])(),
    ebft_embed_method='last_token',
    ebft_use_whitening=False,
    ebft_alignment_coef=1.0,
    ebft_diversity_coef=1.0,
    ebft_ce_coef=0.0,
    ebft_adaptive_max_tokens=True,
    ebft_gt_length_multiplier=1.5,
)

Common fields shared between structured and strided EBFT configs.