processing_strategies

processing_strategies

Module containing ProcessingStrategy classes and its derivative for different MultiModal Model types

Classes

Name Description
Gemma3ProcessingStrategy Processing Strategy class for Gemma3.
Gemma3nProcessingStrategy Gemma3n: same turn boundaries as Gemma3, additionally masks audio/delimiter tokens.
Gemma4ProcessingStrategy Processing Strategy class for Gemma 4.
Gemma4UnifiedProcessingStrategy Processing Strategy for Gemma 4 Unified (encoder-free image/audio/video).
Glm4vProcessingStrategy Shared strategy for Glm4vProcessor (GLM-4V / GLM-4.1V) and
InternVLProcessingStrategy Processing Strategy class for InternVL.
Llama3_2VisionProcessingStrategy Processing Strategy class for Llama-3.2 Vision (<\|start_header_id\|>{role}<\|end_header_id\|>\n\n ... <\|eot_id\|>).
Llama4ProcessingStrategy Processing Strategy class for Llama 4 (<\|header_start\|>{role}<\|header_end\|>\n\n ... <\|eot\|>).
Mistral3ProcessingStrategy Processing Strategy class for Mistral3.
MistralV7TekkenProcessingStrategy Processing Strategy class for Mistral v7 Tekken (Pixtral-style plus [SYSTEM_PROMPT]...[/SYSTEM_PROMPT]).
PixtralProcessingStrategy Processing Strategy class for Pixtral ([INST] ... [/INST] user, assistant terminates at eos_token).
ProcessingStrategy Base Processing Strategy class.
Qwen2VLProcessingStrategy Processing Strategy class for Qwen2-VL (ChatML <\|im_start\|>{role}\n ... <\|im_end\|>).
Qwen3_5ProcessingStrategy Processing Strategy class for Qwen3.5 (Qwen2-VL boundaries + <\|video_pad\|> mask).
RoleBoundary One role’s token-level span markers for the masking scanner.
SmolVLM2ProcessingStrategy Processing Strategy class for SmolVLM2.
VoxtralProcessingStrategy Processing Strategy class for Voxtral.

Gemma3ProcessingStrategy

processing_strategies.Gemma3ProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Gemma3.

Gemma3nProcessingStrategy

processing_strategies.Gemma3nProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Gemma3n: same turn boundaries as Gemma3, additionally masks audio/delimiter tokens.

Gemma4ProcessingStrategy

processing_strategies.Gemma4ProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Gemma 4.

Boundary markers <|turn>model ... <turn|> verified against google/gemma-4-E2B-it. boi/eoi/boa/eoa ids are resolved via convert_tokens_to_ids since only their string forms are on the processor.

Gemma4UnifiedProcessingStrategy

processing_strategies.Gemma4UnifiedProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy for Gemma 4 Unified (encoder-free image/audio/video).

The unified checkpoint shares Gemma 4’s turn format and the same media placeholder/delimiter token set (image/audio/video, boi/eoi/boa/eoa), so boundary detection and label masking are inherited unchanged — both resolve ids dynamically from the processor/tokenizer rather than hard-coding them. The encoder-free raw pixel/waveform projection is handled entirely by the HF processor, so the strategy itself needs no audio/vision-specific logic.

Glm4vProcessingStrategy

processing_strategies.Glm4vProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Shared strategy for Glm4vProcessor (GLM-4V / GLM-4.1V) and Glm46VProcessor (GLM-4.6V / GLM-4.7V) — identical media-token markers.

Role boundaries unverified; use cfg.role_boundaries to enable masking.

InternVLProcessingStrategy

processing_strategies.InternVLProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for InternVL.

Role boundaries NOT declared (InternLM-style template unverified); falls back to pad + image-id masking with a one-shot warning.

Llama3_2VisionProcessingStrategy

processing_strategies.Llama3_2VisionProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Llama-3.2 Vision (<|start_header_id|>{role}<|end_header_id|>\n\n ... <|eot_id|>).

Llama4ProcessingStrategy

processing_strategies.Llama4ProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Llama 4 (<|header_start|>{role}<|header_end|>\n\n ... <|eot|>).

Mistral3ProcessingStrategy

processing_strategies.Mistral3ProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Mistral3.

Role boundaries NOT declared (mistral-common instruct tokenizer unverified); same fallback as VoxtralProcessingStrategy.

MistralV7TekkenProcessingStrategy

processing_strategies.MistralV7TekkenProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Mistral v7 Tekken (Pixtral-style plus [SYSTEM_PROMPT]...[/SYSTEM_PROMPT]).

Same [/INST]-shared-marker treatment as :class:PixtralProcessingStrategy.

PixtralProcessingStrategy

processing_strategies.PixtralProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Pixtral ([INST] ... [/INST] user, assistant terminates at eos_token).

[/INST] is shared between user-end and assistant-start. We declare user with include_end=False so the scanner hands the [/INST] back to assistant’s start match on the next iteration.

ProcessingStrategy

processing_strategies.ProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Base Processing Strategy class.

Subclasses opt in to role masking by overriding _build_role_boundaries; otherwise only pad + media tokens are masked (legacy behavior, one-shot warned).

Qwen2VLProcessingStrategy

processing_strategies.Qwen2VLProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Qwen2-VL (ChatML <|im_start|>{role}\n ... <|im_end|>).

Qwen3_5ProcessingStrategy

processing_strategies.Qwen3_5ProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Qwen3.5 (Qwen2-VL boundaries + <|video_pad|> mask).

RoleBoundary

processing_strategies.RoleBoundary(
    role,
    start_tokens,
    end_tokens=list(),
    include_start=False,
    include_end=True,
)

One role’s token-level span markers for the masking scanner.

Empty end_tokens means end-of-sequence terminates the span.

SmolVLM2ProcessingStrategy

processing_strategies.SmolVLM2ProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for SmolVLM2.

Role boundaries NOT declared — SmolVLM2 chat_template varies per checkpoint (HuggingFaceTB ships multiple variants), so we opt out rather than mis-mask.

VoxtralProcessingStrategy

processing_strategies.VoxtralProcessingStrategy(
    processor,
    chat_template=None,
    image_size=None,
    image_resize_algorithm=None,
    train_on_inputs=False,
    roles_to_train=None,
    train_on_eos=None,
    role_boundaries_override=None,
    field_messages=None,
)

Processing Strategy class for Voxtral.

Role boundaries NOT declared — mistral-common instruct tokenizer markers unverified. Falls back to pad+audio masking with a one-shot warning.