integrations.swanlab.completion_logger

integrations.swanlab.completion_logger

SwanLab completion logger for RLHF/DPO/KTO/ORPO/GRPO training.

This module provides utilities for logging model completions during preference training to SwanLab for qualitative analysis.

Classes

Name Description
CompletionLogger Memory-bounded logger for RLHF completions.

CompletionLogger

integrations.swanlab.completion_logger.CompletionLogger(maxlen=128)

Memory-bounded logger for RLHF completions.

Stores prompts, completions, and rewards in fixed-size deques to prevent memory leaks during long training runs. Logs completion tables to SwanLab for qualitative analysis of model outputs.

Example usage

logger = CompletionLogger(maxlen=128) logger.add_dpo_completion( … step=0, … prompt=“What is AI?”, … chosen=“Artificial Intelligence is…”, … rejected=“AI means…”, … reward_diff=0.5 … ) logger.log_to_swanlab()

Attributes

Name Type Description
maxlen Maximum number of completions to store (older ones are dropped)
data deque[Mapping[str, Any]] Deque storing completion dictionaries

Methods

Name Description
add_dpo_completion Add a DPO completion to the buffer.
add_grpo_completion Add a GRPO completion to the buffer.
add_kto_completion Add a KTO completion to the buffer.
add_orpo_completion Add an ORPO completion to the buffer.
clear Clear all buffered completions.
log_to_swanlab Log buffered completions to SwanLab as a table.
add_dpo_completion
integrations.swanlab.completion_logger.CompletionLogger.add_dpo_completion(
    step,
    prompt,
    chosen,
    rejected,
    reward_diff=None,
)

Add a DPO completion to the buffer.

Parameters
Name Type Description Default
step int Training step number required
prompt str Input prompt required
chosen str Chosen (preferred) completion required
rejected str Rejected (non-preferred) completion required
reward_diff float | None Reward difference (chosen - rejected), if available None
add_grpo_completion
integrations.swanlab.completion_logger.CompletionLogger.add_grpo_completion(
    step,
    prompt,
    completion,
    reward=None,
    advantage=None,
)

Add a GRPO completion to the buffer.

Parameters
Name Type Description Default
step int Training step number required
prompt str Input prompt required
completion str Model-generated completion required
reward float | None Reward score from reward model None
advantage float | None Advantage estimate (reward - baseline) None
add_kto_completion
integrations.swanlab.completion_logger.CompletionLogger.add_kto_completion(
    step,
    prompt,
    completion,
    label,
    reward=None,
)

Add a KTO completion to the buffer.

Parameters
Name Type Description Default
step int Training step number required
prompt str Input prompt required
completion str Model-generated completion required
label bool True if desirable, False if undesirable required
reward float | None Reward score, if available None
add_orpo_completion
integrations.swanlab.completion_logger.CompletionLogger.add_orpo_completion(
    step,
    prompt,
    chosen,
    rejected,
    log_odds_ratio=None,
)

Add an ORPO completion to the buffer.

Parameters
Name Type Description Default
step int Training step number required
prompt str Input prompt required
chosen str Chosen (preferred) completion required
rejected str Rejected (non-preferred) completion required
log_odds_ratio float | None Log odds ratio between chosen and rejected None
clear
integrations.swanlab.completion_logger.CompletionLogger.clear()

Clear all buffered completions.

log_to_swanlab
integrations.swanlab.completion_logger.CompletionLogger.log_to_swanlab(
    table_name='completions',
)

Log buffered completions to SwanLab as a table.

Creates a SwanLab echarts Table with all buffered completions. Only logs if SwanLab is initialized and data is available.

Parameters
Name Type Description Default
table_name str Name of the table in SwanLab dashboard. Default: “completions” 'completions'
Returns
Name Type Description
bool True if logging succeeded, False otherwise