Telemetry

A description of the telemetry implementation in Axolotl.

Telemetry in Axolotl

Axolotl implements anonymous telemetry to help maintainers understand how the library is used and where users encounter issues. This data helps prioritize features, optimize performance, and fix bugs.

Data Collection

We collect:

  • System info: OS, Python version, Axolotl version, PyTorch version, Transformers version, etc.
  • Hardware info: CPU count, memory, GPU count and models
  • Runtime metrics: Training progress, memory usage, timing information
  • Usage patterns: Models (from a whitelist) and configurations used
  • Error tracking: Stack traces and error messages (sanitized to remove personal information)

Personally identifiable information (PII) is not collected.

Implementation

Telemetry is implemented using PostHog and consists of:

  • axolotl.telemetry.TelemetryManager: A singleton class that initializes the telemetry system and provides methods for tracking events.
  • axolotl.telemetry.errors.send_errors: A decorator that captures exceptions and sends sanitized stack traces.
  • axolotl.telemetry.runtime_metrics.RuntimeMetricsTracker: A class that tracks runtime metrics during training.
  • axolotl.telemetry.callbacks.TelemetryCallback: A Trainer callback that sends runtime metrics telemetry.

The telemetry system will block training startup for 10 seconds to ensure users are aware of data collection, unless telemetry is explicitly enabled or disabled.

Opt-Out Mechanism

Telemetry is enabled by default on an opt-out basis. To disable it, set AXOLOTL_DO_NOT_TRACK=1 or DO_NOT_TRACK=1.

A warning message will be logged on start to clearly inform users about telemetry. We will remove this after some period.

To hide the warning message about telemetry that is displayed on train, etc. startup, explicitly set: AXOLOTL_DO_NOT_TRACK=0 (enable telemetry) or AXOLOTL_DO_NOT_TRACK=1 (explicitly disable telemetry).

Privacy

  • All path-like config information is automatically redacted from telemetry data
  • Model information is only collected for whitelisted organizations
    • See axolotl/telemetry/whitelist.yaml for the set of whitelisted organizations
  • Each run generates a unique anonymous ID
    • This allows us to link different telemetry events in a single same training run
  • Telemetry is only sent from the main process to avoid duplicate events