core.trainers.grpo.args

core.trainers.grpo.args

Axolotl Specific Training Args

Classes

Name Description
AxolotlAsyncGRPOConfig Axolotl Async GRPO Config — adds async prefetch, streaming scoring, and IS correction.
AxolotlGRPOConfig Axolotl GRPO Config for GRPO training

AxolotlAsyncGRPOConfig

core.trainers.grpo.args.AxolotlAsyncGRPOConfig(
    use_data_producer=False,
    async_prefetch=False,
    prefetch_depth=1,
    vllm_sync_interval=1,
    batch_flattening=False,
    streaming_partial_batch=False,
    streaming_min_groups=1,
    vllm_importance_sampling_correction=True,
    vllm_importance_sampling_mode='token_truncate',
    vllm_importance_sampling_cap=3.0,
    off_policy_mask_threshold=None,
    use_bias_correction_kl=False,
    reward_num_workers=1,
    replay_buffer_size=0,
    replay_recompute_logps=True,
    reroll_start_fraction=0.5,
    reroll_max_groups=1,
    skip_zero_advantage_batches=True,
    vllm_lora_sync=False,
    context_parallel_size=None,
)

Axolotl Async GRPO Config — adds async prefetch, streaming scoring, and IS correction.

AxolotlGRPOConfig

core.trainers.grpo.args.AxolotlGRPOConfig(context_parallel_size=None)

Axolotl GRPO Config for GRPO training