kernels.autotune_telemetry
kernels.autotune_telemetry
Telemetry for the fused RMSNorm+RoPE Triton autotune selections.
Mirrors the scattermoe-lora autotune telemetry
(:mod:axolotl.integrations.kernels.autotune_callback): after the kernel’s
@triton.autotune cache is populated by the first backward pass, report the
selected configs alongside GPU identity so the per-hardware tuning that varies
across architectures can be aggregated.
Classes
| Name | Description |
|---|---|
| FusedRopeAutotuneReportCallback | Reports fused RMSNorm+RoPE autotune selections via telemetry. |
FusedRopeAutotuneReportCallback
kernels.autotune_telemetry.FusedRopeAutotuneReportCallback()Reports fused RMSNorm+RoPE autotune selections via telemetry.
Fires once after the autotune cache is populated (the first step whose
backward has run), retrying up to _MAX_POLL_STEP then giving up. Every
later on_step_end short-circuits on _reported — zero hot-path cost.
Functions
| Name | Description |
|---|---|
| collect_fused_rope_autotune_configs | Read the autotune .cache from the fused RMSNorm+RoPE backward kernel. |
collect_fused_rope_autotune_configs
kernels.autotune_telemetry.collect_fused_rope_autotune_configs()Read the autotune .cache from the fused RMSNorm+RoPE backward kernel.
Each entry is {"kernel", "key", "config"} — the same shape the
scattermoe collector emits, so both event types aggregate uniformly.
Returns [] if Triton/the kernel isn’t loaded or nothing autotuned yet.