monkeypatch.models.qwen3_next.modeling

monkeypatch.models.qwen3_next.modeling

Monkeypatch for Qwen3_Next model to pass position_ids to linear attention.

Functions

Name Description
get_cu_seqlens Adapted from transformers.modeling_flash_attention_utils.prepare_fa_kwargs_from_position_ids.
patch_qwen3_next_decoder_layer Patch Qwen3NextDecoderLayer to pass position_ids to linear attention.
patch_qwen3_next_gateddelta_layer Patch Qwen3NextGatedDeltaNet to parse cu_seqlens and pass to chunk_gated_delta_rule
patch_qwen3_next_imports Patch Qwen3Next imports to use try/except instead of is_flash_linear_attention_available.
patch_qwen3_next_modeling_packing Apply all Qwen3Next model patches.

get_cu_seqlens

monkeypatch.models.qwen3_next.modeling.get_cu_seqlens(position_ids)

Adapted from transformers.modeling_flash_attention_utils.prepare_fa_kwargs_from_position_ids.

https://github.com/huggingface/transformers/blob/0f1b128d3359a26bd18be99c26d7f04fb3cba914/src/transformers/modeling_flash_attention_utils.py#L316

patch_qwen3_next_decoder_layer

monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_decoder_layer()

Patch Qwen3NextDecoderLayer to pass position_ids to linear attention.

patch_qwen3_next_gateddelta_layer

monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_gateddelta_layer()

Patch Qwen3NextGatedDeltaNet to parse cu_seqlens and pass to chunk_gated_delta_rule

patch_qwen3_next_imports

monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_imports()

Patch Qwen3Next imports to use try/except instead of is_flash_linear_attention_available.

patch_qwen3_next_modeling_packing

monkeypatch.models.qwen3_next.modeling.patch_qwen3_next_modeling_packing()

Apply all Qwen3Next model patches.