monkeypatch.models.nemotron_h.modeling
monkeypatch.models.nemotron_h.modeling
Sample-packing and context-parallelism patch for NemotronH (Mamba2/Attention/MoE hybrid).
Threads seq_idx (derived from position_ids) into the Mamba2 SSM kernels so packed-sequence boundaries reset SSM state. Upstream hard-codes seq_idx=None, which leaks hidden state across boundaries. Attention and MoE blocks need no changes — only the Mamba2 mixer is patched.
CP correction (ring-shift of SSM state + additive output fix) is handled by
wrap_mamba_scan_for_cp from mamba_utils, which wraps the
mamba_chunk_scan_combined call at the module level.
Functions
| Name | Description |
|---|---|
| patch_nemotron_h_modeling_packing | Patch NemotronH for sample packing: seq_idx threading into Mamba2 SSM kernels. |
patch_nemotron_h_modeling_packing
monkeypatch.models.nemotron_h.modeling.patch_nemotron_h_modeling_packing()Patch NemotronH for sample packing: seq_idx threading into Mamba2 SSM kernels.
_get_unpad_data is handled by SUPPORTED_MULTIPACK_MODEL_TYPES / patch_for_multipack(). This function only applies the seq_idx patches that are unique to nemotron_h.