utils.callbacks.opentelemetry

utils.callbacks.opentelemetry

OpenTelemetry metrics callback for Axolotl training

Classes

Name Description
OpenTelemetryMetricsCallback TrainerCallback that exports training metrics to OpenTelemetry/Prometheus.

OpenTelemetryMetricsCallback

utils.callbacks.opentelemetry.OpenTelemetryMetricsCallback(cfg)

TrainerCallback that exports training metrics to OpenTelemetry/Prometheus.

This callback automatically tracks key training metrics including: - Training loss - Evaluation loss - Learning rate - Epoch progress - Global step count - Gradient norm

Metrics are exposed via HTTP endpoint for Prometheus scraping.

Methods

Name Description
on_evaluate Called after evaluation
on_log Called when logging occurs
on_step_end Called at the end of each training step
on_train_begin Called at the beginning of training
on_train_end Called at the end of training
on_evaluate
utils.callbacks.opentelemetry.OpenTelemetryMetricsCallback.on_evaluate(
    args,
    state,
    control,
    metrics=None,
    **kwargs,
)

Called after evaluation

on_log
utils.callbacks.opentelemetry.OpenTelemetryMetricsCallback.on_log(
    args,
    state,
    control,
    logs=None,
    **kwargs,
)

Called when logging occurs

on_step_end
utils.callbacks.opentelemetry.OpenTelemetryMetricsCallback.on_step_end(
    args,
    state,
    control,
    **kwargs,
)

Called at the end of each training step

on_train_begin
utils.callbacks.opentelemetry.OpenTelemetryMetricsCallback.on_train_begin(
    args,
    state,
    control,
    **kwargs,
)

Called at the beginning of training

on_train_end
utils.callbacks.opentelemetry.OpenTelemetryMetricsCallback.on_train_end(
    args,
    state,
    control,
    **kwargs,
)

Called at the end of training