monkeypatch.models.qwen3_next.modeling
monkeypatch.models.qwen3_next.modeling
Monkeypatch for Qwen3_Next model to pass position_ids to linear attention.
Functions
| Name | Description |
|---|---|
| get_cu_seqlens | Adapted from transformers.modeling_flash_attention_utils.prepare_fa_kwargs_from_position_ids. |
| patch_qwen3_next_decoder_layer | Patch Qwen3NextDecoderLayer to pass position_ids to linear attention. |
| patch_qwen3_next_gateddelta_layer | Patch Qwen3NextGatedDeltaNet to parse cu_seqlens and pass to chunk_gated_delta_rule |
| patch_qwen3_next_imports | Patch Qwen3Next imports to use try/except instead of is_flash_linear_attention_available. |
| patch_qwen3_next_modeling_packing | Apply all Qwen3Next model patches. |
get_cu_seqlens
monkeypatch.models.qwen3_next.modeling.get_cu_seqlens(position_ids)Adapted from transformers.modeling_flash_attention_utils.prepare_fa_kwargs_from_position_ids.
https://github.com/huggingface/transformers/blob/0f1b128d3359a26bd18be99c26d7f04fb3cba914/src/transformers/modeling_flash_attention_utils.py#L316
patch_qwen3_next_decoder_layer
monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_decoder_layer()Patch Qwen3NextDecoderLayer to pass position_ids to linear attention.
patch_qwen3_next_gateddelta_layer
monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_gateddelta_layer()Patch Qwen3NextGatedDeltaNet to parse cu_seqlens and pass to chunk_gated_delta_rule
patch_qwen3_next_imports
monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_imports()Patch Qwen3Next imports to use try/except instead of is_flash_linear_attention_available.
patch_qwen3_next_modeling_packing
monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_modeling_packing()Apply all Qwen3Next model patches.