monkeypatch.models.falcon_h1.modeling
monkeypatch.models.falcon_h1.modeling
Sample-packing and context-parallelism patch for Falcon-H1 (parallel Mamba2/Attention hybrid).
Threads seq_idx (derived from position_ids) into the Mamba2 SSM kernels so packed-sequence boundaries reset SSM state. Upstream hard-codes seq_idx=None, which leaks hidden state across boundaries.
Unlike Nemotron-H (which selects block_type per layer), Falcon-H1 runs both Mamba2 and Attention in parallel in every FalconH1DecoderLayer, so we always need seq_idx for the mamba branch.
Functions
| Name | Description |
|---|---|
| patch_falcon_h1_modeling_packing | Patch Falcon-H1 for sample packing: seq_idx threading into Mamba2 SSM kernels. |
patch_falcon_h1_modeling_packing
monkeypatch.models.falcon_h1.modeling.patch_falcon_h1_modeling_packing()Patch Falcon-H1 for sample packing: seq_idx threading into Mamba2 SSM kernels.