utils.callbacks
utils.callbacks
Callbacks for Trainer class
Classes
| Name | Description |
|---|---|
| GCCallback | Runs gc.collect() + torch.cuda.empty_cache() on |
| LossWatchDogCallback | Callback to track loss and stop training if loss is too high |
| SaveAxolotlConfigtoWandBCallback | Callback to save axolotl config to wandb |
| SaveModelOnFirstStepCallback | Callback to save the model on the first step of training if enabled |
| SkipEvalOnResumeCallback | Skip the redundant evaluation that fires when resuming from a checkpoint |
GCCallback
utils.callbacks.GCCallback(gc_collect_steps=-1, gc_steps=None)Runs gc.collect() + torch.cuda.empty_cache() on
gc_collect_steps intervals and on eval/save/epoch boundaries that
the Trainer’s native torch_empty_cache_steps doesn’t cover. The two
settings are complementary; overlapping intervals just double-clear.
LossWatchDogCallback
utils.callbacks.LossWatchDogCallback(cfg)Callback to track loss and stop training if loss is too high
SaveAxolotlConfigtoWandBCallback
utils.callbacks.SaveAxolotlConfigtoWandBCallback(axolotl_config_path)Callback to save axolotl config to wandb
SaveModelOnFirstStepCallback
utils.callbacks.SaveModelOnFirstStepCallback()Callback to save the model on the first step of training if enabled
SkipEvalOnResumeCallback
utils.callbacks.SkipEvalOnResumeCallback()Skip the redundant evaluation that fires when resuming from a checkpoint
whose step aligns with eval_steps.
When HuggingFace Trainer resumes, it restores global_step from the
checkpoint and immediately triggers _maybe_log_save_evaluate for that
step. Because the evaluation was already performed during the original
run, repeating it wastes time and pollutes metric logs.
This callback records the global_step at the start of training (i.e.
the checkpoint step when resuming, or 0 for a fresh run) and suppresses
any evaluation request on that exact step.