core.trainers.ebft.args
core.trainers.ebft.args
EBFT-specific training arguments.
Two config classes: - AxolotlEBFTConfig: extends GRPOConfig for structured QA data (uses vLLM generation) - AxolotlStridedEBFTConfig: extends TrainingArguments for unstructured text (strided generation)
Classes
| Name | Description |
|---|---|
| AxolotlAsyncEBFTConfig | EBFT config for async structured QA data — extends FastAsyncGRPOConfig. |
| AxolotlEBFTConfig | EBFT config for structured QA data — extends GRPOConfig. |
| AxolotlStridedEBFTConfig | EBFT config for unstructured text with strided block-parallel generation. |
| EBFTFieldsMixin | Common fields shared between structured and strided EBFT configs. |
AxolotlAsyncEBFTConfig
core.trainers.ebft.args.AxolotlAsyncEBFTConfig(
use_data_producer=False,
async_prefetch=False,
prefetch_depth=1,
vllm_sync_interval=1,
batch_flattening=False,
streaming_partial_batch=False,
streaming_min_groups=1,
vllm_importance_sampling_correction=True,
vllm_importance_sampling_mode='token_truncate',
vllm_importance_sampling_cap=3.0,
off_policy_mask_threshold=None,
use_bias_correction_kl=False,
reward_num_workers=1,
replay_buffer_size=0,
replay_recompute_logps=True,
reroll_start_fraction=0.5,
reroll_max_groups=1,
skip_zero_advantage_batches=True,
vllm_lora_sync=False,
ebft_feature_layers=(lambda: [0.25, 0.5, 0.75])(),
ebft_embed_method='last_token',
ebft_use_whitening=False,
ebft_alignment_coef=1.0,
ebft_diversity_coef=1.0,
ebft_ce_coef=0.0,
ebft_adaptive_max_tokens=True,
ebft_gt_length_multiplier=1.5,
)EBFT config for async structured QA data — extends FastAsyncGRPOConfig.
Includes all async fields: async_prefetch, vllm_lora_sync, skip_zero_advantage_batches, streaming_partial_batch, replay_buffer_size, etc.
AxolotlEBFTConfig
core.trainers.ebft.args.AxolotlEBFTConfig(
ebft_feature_layers=(lambda: [0.25, 0.5, 0.75])(),
ebft_embed_method='last_token',
ebft_use_whitening=False,
ebft_alignment_coef=1.0,
ebft_diversity_coef=1.0,
ebft_ce_coef=0.0,
ebft_adaptive_max_tokens=True,
ebft_gt_length_multiplier=1.5,
vllm_lora_sync=False,
)EBFT config for structured QA data — extends GRPOConfig.
AxolotlStridedEBFTConfig
core.trainers.ebft.args.AxolotlStridedEBFTConfig(
ebft_feature_layers=(lambda: [0.25, 0.5, 0.75])(),
ebft_embed_method='last_token',
ebft_use_whitening=False,
ebft_alignment_coef=1.0,
ebft_diversity_coef=1.0,
ebft_ce_coef=0.0,
ebft_adaptive_max_tokens=True,
ebft_gt_length_multiplier=1.5,
ebft_stride=8,
ebft_context_length=8,
ebft_generate_max_len=8,
ebft_n_samples_per_prompt=4,
ebft_temperature=0.6,
ebft_top_p=1.0,
ebft_rl_coef=1.0,
ebft_advantage_estimator='rloo',
ebft_min_completion_prefix=0,
)EBFT config for unstructured text with strided block-parallel generation.
EBFTFieldsMixin
core.trainers.ebft.args.EBFTFieldsMixin(
ebft_feature_layers=(lambda: [0.25, 0.5, 0.75])(),
ebft_embed_method='last_token',
ebft_use_whitening=False,
ebft_alignment_coef=1.0,
ebft_diversity_coef=1.0,
ebft_ce_coef=0.0,
ebft_adaptive_max_tokens=True,
ebft_gt_length_multiplier=1.5,
)Common fields shared between structured and strided EBFT configs.