monkeypatch.models.qwen3_next.modeling

monkeypatch.models.qwen3_next.modeling

Monkeypatch for Qwen3_Next model to pass position_ids to linear attention.

Functions

Name	Description
get_cu_seqlens	Adapted from transformers.modeling_flash_attention_utils.prepare_fa_kwargs_from_position_ids.
patch_qwen3_next_decoder_layer	Patch Qwen3NextDecoderLayer to pass position_ids to linear attention.
patch_qwen3_next_gateddelta_layer	Patch Qwen3NextGatedDeltaNet to parse cu_seqlens and pass to chunk_gated_delta_rule
patch_qwen3_next_imports	Patch Qwen3Next imports to use try/except instead of is_flash_linear_attention_available.
patch_qwen3_next_modeling_packing	Apply all Qwen3Next model patches.

monkeypatch.models.qwen3_next.modeling.get_cu_seqlens(position_ids)

Adapted from transformers.modeling_flash_attention_utils.prepare_fa_kwargs_from_position_ids.

https://github.com/huggingface/transformers/blob/0f1b128d3359a26bd18be99c26d7f04fb3cba914/src/transformers/modeling_flash_attention_utils.py#L316

monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_decoder_layer()

Patch Qwen3NextDecoderLayer to pass position_ids to linear attention.

monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_gateddelta_layer()

Patch Qwen3NextGatedDeltaNet to parse cu_seqlens and pass to chunk_gated_delta_rule

monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_imports()

Patch Qwen3Next imports to use try/except instead of is_flash_linear_attention_available.

monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_modeling_packing()

Apply all Qwen3Next model patches.