kernels.autotune_telemetry

kernels.autotune_telemetry

Telemetry for the fused RMSNorm+RoPE Triton autotune selections.

Mirrors the scattermoe-lora autotune telemetry (:mod:axolotl.integrations.kernels.autotune_callback): after the kernel’s @triton.autotune cache is populated by the first backward pass, report the selected configs alongside GPU identity so the per-hardware tuning that varies across architectures can be aggregated.

Classes

Name Description
FusedRopeAutotuneReportCallback Reports fused RMSNorm+RoPE autotune selections via telemetry.

FusedRopeAutotuneReportCallback

kernels.autotune_telemetry.FusedRopeAutotuneReportCallback()

Reports fused RMSNorm+RoPE autotune selections via telemetry.

Fires once after the autotune cache is populated (the first step whose backward has run), retrying up to _MAX_POLL_STEP then giving up. Every later on_step_end short-circuits on _reported — zero hot-path cost.

Functions

Name Description
collect_fused_rope_autotune_configs Read the autotune .cache from the fused RMSNorm+RoPE backward kernel.

collect_fused_rope_autotune_configs

kernels.autotune_telemetry.collect_fused_rope_autotune_configs()

Read the autotune .cache from the fused RMSNorm+RoPE backward kernel.

Each entry is {"kernel", "key", "config"} — the same shape the scattermoe collector emits, so both event types aggregate uniformly. Returns [] if Triton/the kernel isn’t loaded or nothing autotuned yet.