monkeypatch.models.nemotron_h.modeling

monkeypatch.models.nemotron_h.modeling

Sample-packing and context-parallelism patch for NemotronH (Mamba2/Attention/MoE hybrid).

Threads seq_idx (derived from position_ids) into the Mamba2 SSM kernels so packed-sequence boundaries reset SSM state. Upstream hard-codes seq_idx=None, which leaks hidden state across boundaries. Attention and MoE blocks need no changes — only the Mamba2 mixer is patched.

CP correction (ring-shift of SSM state + additive output fix) is handled by wrap_mamba_scan_for_cp from mamba_utils, which wraps the mamba_chunk_scan_combined call at the module level.

Functions

Name	Description
patch_nemotron_h_modeling_packing	Patch NemotronH for sample packing: seq_idx threading into Mamba2 SSM kernels.

patch_nemotron_h_modeling_packing

monkeypatch.models.nemotron_h.modeling.patch_nemotron_h_modeling_packing()

Patch NemotronH for sample packing: seq_idx threading into Mamba2 SSM kernels.

_get_unpad_data is handled by SUPPORTED_MULTIPACK_MODEL_TYPES / patch_for_multipack(). This function only applies the seq_idx patches that are unique to nemotron_h.