integrations.swanlab.callbacks

integrations.swanlab.callbacks

SwanLab callbacks for Axolotl trainers.

This module provides HuggingFace Trainer callbacks for logging RLHF completions to SwanLab.

Classes

Name Description
SwanLabRLHFCompletionCallback Callback for logging RLHF completions to SwanLab.

SwanLabRLHFCompletionCallback

integrations.swanlab.callbacks.SwanLabRLHFCompletionCallback(
    log_interval=100,
    max_completions=128,
    table_name='rlhf_completions',
)

Callback for logging RLHF completions to SwanLab.

This callback periodically logs model completions (prompts, chosen/rejected responses, rewards) to SwanLab during RLHF training for qualitative analysis.

Supports DPO, KTO, ORPO, and GRPO trainers.

Example usage

callback = SwanLabRLHFCompletionCallback( … log_interval=100, # Log every 100 steps … max_completions=128, # Keep last 128 completions … ) trainer.add_callback(callback)

Attributes

Name Type Description
logger CompletionLogger instance
log_interval Number of steps between SwanLab logging
trainer_type str | None Auto-detected trainer type (dpo/kto/orpo/grpo)

Methods

Name Description
on_init_end Detect trainer type on initialization.
on_log Capture completions from logs and buffer them.
on_train_end Log remaining completions at end of training.
on_init_end
integrations.swanlab.callbacks.SwanLabRLHFCompletionCallback.on_init_end(
    args,
    state,
    control,
    **kwargs,
)

Detect trainer type on initialization.

on_log
integrations.swanlab.callbacks.SwanLabRLHFCompletionCallback.on_log(
    args,
    state,
    control,
    logs=None,
    **kwargs,
)

Capture completions from logs and buffer them.

Different trainers log completions in different formats: - DPO: logs[‘dpo/chosen’], logs[‘dpo/rejected’], logs[‘dpo/reward_diff’] - KTO: logs[‘kto/completion’], logs[‘kto/label’], logs[‘kto/reward’] - ORPO: logs[‘orpo/chosen’], logs[‘orpo/rejected’] - GRPO: logs[‘grpo/completion’], logs[‘grpo/reward’]

Note: This is a placeholder implementation. Actual log keys depend on the TRL trainer implementation. You may need to patch the trainers to expose completion data in logs.

on_train_end
integrations.swanlab.callbacks.SwanLabRLHFCompletionCallback.on_train_end(
    args,
    state,
    control,
    **kwargs,
)

Log remaining completions at end of training.