utils.callbacks.dynamic_checkpoint

utils.callbacks.dynamic_checkpoint

Classes

Name Description
DynamicCheckpointCallback Callback to save checkpoints on-demand during training via:

DynamicCheckpointCallback

utils.callbacks.dynamic_checkpoint.DynamicCheckpointCallback(cfg)

Callback to save checkpoints on-demand during training via: 1. File-based trigger (works everywhere, rank 0 checks file)

Thread-safe for multi-GPU distributed training.

Usage

File-based:

touch /path/to/output_dir/axolotl_checkpoint.save

Methods

Name Description
on_step_end Check for checkpoint triggers at the end of each step.
on_step_end
utils.callbacks.dynamic_checkpoint.DynamicCheckpointCallback.on_step_end(
    args,
    state,
    control,
    **_kwargs,
)

Check for checkpoint triggers at the end of each step. ONLY rank 0 checks the file, then all ranks synchronize.