monkeypatch.models.granitemoehybrid.modeling

monkeypatch.models.granitemoehybrid.modeling

Sample-packing and context-parallelism patch for Granite MoE Hybrid (Mamba2/Attention/MoE).

Upstream GraniteMoeHybridMambaLayer already accepts seq_idx on forward/cuda_kernels_forward, and GraniteMoeHybridDecoderLayer passes **kwargs through to the mixer. However, the decoder layer does not receive position_ids directly — it arrives at the model level.

This patch: 1. Injects seq_idx computation into GraniteMoeHybridModel.forward so it flows through kwargs -> decoder_layer -> mamba mixer automatically. 2. Forces the slow path when CP is active (the fused path doesn’t return SSM state). CP correction is handled by wrap_mamba_scan_for_cp.

Functions

Name Description
patch_granitemoehybrid_modeling_packing Patch Granite MoE Hybrid for sample packing: seq_idx + CP correction.

patch_granitemoehybrid_modeling_packing

monkeypatch.models.granitemoehybrid.modeling.patch_granitemoehybrid_modeling_packing(
)

Patch Granite MoE Hybrid for sample packing: seq_idx + CP correction.