monkeypatch.models.falcon_h1.modeling

monkeypatch.models.falcon_h1.modeling

Sample-packing and context-parallelism patch for Falcon-H1 (parallel Mamba2/Attention hybrid).

Threads seq_idx (derived from position_ids) into the Mamba2 SSM kernels so packed-sequence boundaries reset SSM state. Upstream hard-codes seq_idx=None, which leaks hidden state across boundaries.

Unlike Nemotron-H (which selects block_type per layer), Falcon-H1 runs both Mamba2 and Attention in parallel in every FalconH1DecoderLayer, so we always need seq_idx for the mamba branch.

Functions

Name Description
patch_falcon_h1_modeling_packing Patch Falcon-H1 for sample packing: seq_idx threading into Mamba2 SSM kernels.

patch_falcon_h1_modeling_packing

monkeypatch.models.falcon_h1.modeling.patch_falcon_h1_modeling_packing()

Patch Falcon-H1 for sample packing: seq_idx threading into Mamba2 SSM kernels.