utils.callbacks

utils.callbacks

Callbacks for Trainer class

Classes

Name Description
GCCallback Runs gc.collect() + torch.cuda.empty_cache() on
LossWatchDogCallback Callback to track loss and stop training if loss is too high
SaveAxolotlConfigtoWandBCallback Callback to save axolotl config to wandb
SaveModelOnFirstStepCallback Callback to save the model on the first step of training if enabled
SkipEvalOnResumeCallback Skip the redundant evaluation that fires when resuming from a checkpoint

GCCallback

utils.callbacks.GCCallback(gc_collect_steps=-1, gc_steps=None)

Runs gc.collect() + torch.cuda.empty_cache() on gc_collect_steps intervals and on eval/save/epoch boundaries that the Trainer’s native torch_empty_cache_steps doesn’t cover. The two settings are complementary; overlapping intervals just double-clear.

LossWatchDogCallback

utils.callbacks.LossWatchDogCallback(cfg)

Callback to track loss and stop training if loss is too high

SaveAxolotlConfigtoWandBCallback

utils.callbacks.SaveAxolotlConfigtoWandBCallback(axolotl_config_path)

Callback to save axolotl config to wandb

SaveModelOnFirstStepCallback

utils.callbacks.SaveModelOnFirstStepCallback()

Callback to save the model on the first step of training if enabled

SkipEvalOnResumeCallback

utils.callbacks.SkipEvalOnResumeCallback()

Skip the redundant evaluation that fires when resuming from a checkpoint whose step aligns with eval_steps.

When HuggingFace Trainer resumes, it restores global_step from the checkpoint and immediately triggers _maybe_log_save_evaluate for that step. Because the evaluation was already performed during the original run, repeating it wastes time and pollutes metric logs.

This callback records the global_step at the start of training (i.e. the checkpoint step when resuming, or 0 for a fresh run) and suppresses any evaluation request on that exact step.