monkeypatch.models.granitemoehybrid.modeling
monkeypatch.models.granitemoehybrid.modeling
Sample-packing and context-parallelism patch for Granite MoE Hybrid (Mamba2/Attention/MoE).
Upstream GraniteMoeHybridMambaLayer already accepts seq_idx on forward/cuda_kernels_forward, and GraniteMoeHybridDecoderLayer passes **kwargs through to the mixer. However, the decoder layer does not receive position_ids directly — it arrives at the model level.
This patch:
1. Injects seq_idx computation into GraniteMoeHybridModel.forward so it flows
through kwargs -> decoder_layer -> mamba mixer automatically.
2. Forces the slow path when CP is active (the fused path doesn’t return SSM
state). CP correction is handled by wrap_mamba_scan_for_cp.
Functions
| Name | Description |
|---|---|
| patch_granitemoehybrid_modeling_packing | Patch Granite MoE Hybrid for sample packing: seq_idx + CP correction. |
patch_granitemoehybrid_modeling_packing
monkeypatch.models.granitemoehybrid.modeling.patch_granitemoehybrid_modeling_packing(
)Patch Granite MoE Hybrid for sample packing: seq_idx + CP correction.