processing_strategies
processing_strategies
Module containing ProcessingStrategy classes and its derivative for different MultiModal Model types
Classes
| Name | Description |
|---|---|
| Gemma3ProcessingStrategy | Processing Strategy class for Gemma3. |
| Gemma3nProcessingStrategy | Gemma3n: same turn boundaries as Gemma3, additionally masks audio/delimiter tokens. |
| Gemma4ProcessingStrategy | Processing Strategy class for Gemma 4. |
| Gemma4UnifiedProcessingStrategy | Processing Strategy for Gemma 4 Unified (encoder-free image/audio/video). |
| Glm4vProcessingStrategy | Shared strategy for Glm4vProcessor (GLM-4V / GLM-4.1V) and |
| InternVLProcessingStrategy | Processing Strategy class for InternVL. |
| Llama3_2VisionProcessingStrategy | Processing Strategy class for Llama-3.2 Vision (<\|start_header_id\|>{role}<\|end_header_id\|>\n\n ... <\|eot_id\|>). |
| Llama4ProcessingStrategy | Processing Strategy class for Llama 4 (<\|header_start\|>{role}<\|header_end\|>\n\n ... <\|eot\|>). |
| Mistral3ProcessingStrategy | Processing Strategy class for Mistral3. |
| MistralV7TekkenProcessingStrategy | Processing Strategy class for Mistral v7 Tekken (Pixtral-style plus [SYSTEM_PROMPT]...[/SYSTEM_PROMPT]). |
| PixtralProcessingStrategy | Processing Strategy class for Pixtral ([INST] ... [/INST] user, assistant terminates at eos_token). |
| ProcessingStrategy | Base Processing Strategy class. |
| Qwen2VLProcessingStrategy | Processing Strategy class for Qwen2-VL (ChatML <\|im_start\|>{role}\n ... <\|im_end\|>). |
| Qwen3_5ProcessingStrategy | Processing Strategy class for Qwen3.5 (Qwen2-VL boundaries + <\|video_pad\|> mask). |
| RoleBoundary | One role’s token-level span markers for the masking scanner. |
| SmolVLM2ProcessingStrategy | Processing Strategy class for SmolVLM2. |
| VoxtralProcessingStrategy | Processing Strategy class for Voxtral. |
Gemma3ProcessingStrategy
processing_strategies.Gemma3ProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Gemma3.
Gemma3nProcessingStrategy
processing_strategies.Gemma3nProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Gemma3n: same turn boundaries as Gemma3, additionally masks audio/delimiter tokens.
Gemma4ProcessingStrategy
processing_strategies.Gemma4ProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Gemma 4.
Boundary markers <|turn>model ... <turn|> verified against
google/gemma-4-E2B-it. boi/eoi/boa/eoa ids are resolved via
convert_tokens_to_ids since only their string forms are on the processor.
Gemma4UnifiedProcessingStrategy
processing_strategies.Gemma4UnifiedProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy for Gemma 4 Unified (encoder-free image/audio/video).
The unified checkpoint shares Gemma 4’s turn format and the same media placeholder/delimiter token set (image/audio/video, boi/eoi/boa/eoa), so boundary detection and label masking are inherited unchanged — both resolve ids dynamically from the processor/tokenizer rather than hard-coding them. The encoder-free raw pixel/waveform projection is handled entirely by the HF processor, so the strategy itself needs no audio/vision-specific logic.
Glm4vProcessingStrategy
processing_strategies.Glm4vProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Shared strategy for Glm4vProcessor (GLM-4V / GLM-4.1V) and Glm46VProcessor (GLM-4.6V / GLM-4.7V) — identical media-token markers.
Role boundaries unverified; use cfg.role_boundaries to enable masking.
InternVLProcessingStrategy
processing_strategies.InternVLProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for InternVL.
Role boundaries NOT declared (InternLM-style template unverified); falls back to pad + image-id masking with a one-shot warning.
Llama3_2VisionProcessingStrategy
processing_strategies.Llama3_2VisionProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Llama-3.2 Vision (<|start_header_id|>{role}<|end_header_id|>\n\n ... <|eot_id|>).
Llama4ProcessingStrategy
processing_strategies.Llama4ProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Llama 4 (<|header_start|>{role}<|header_end|>\n\n ... <|eot|>).
Mistral3ProcessingStrategy
processing_strategies.Mistral3ProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Mistral3.
Role boundaries NOT declared (mistral-common instruct tokenizer unverified); same fallback as VoxtralProcessingStrategy.
MistralV7TekkenProcessingStrategy
processing_strategies.MistralV7TekkenProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Mistral v7 Tekken (Pixtral-style plus [SYSTEM_PROMPT]...[/SYSTEM_PROMPT]).
Same [/INST]-shared-marker treatment as :class:PixtralProcessingStrategy.
PixtralProcessingStrategy
processing_strategies.PixtralProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Pixtral ([INST] ... [/INST] user, assistant terminates at eos_token).
[/INST] is shared between user-end and assistant-start. We declare user
with include_end=False so the scanner hands the [/INST] back to
assistant’s start match on the next iteration.
ProcessingStrategy
processing_strategies.ProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Base Processing Strategy class.
Subclasses opt in to role masking by overriding _build_role_boundaries;
otherwise only pad + media tokens are masked (legacy behavior, one-shot warned).
Qwen2VLProcessingStrategy
processing_strategies.Qwen2VLProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Qwen2-VL (ChatML <|im_start|>{role}\n ... <|im_end|>).
Qwen3_5ProcessingStrategy
processing_strategies.Qwen3_5ProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Qwen3.5 (Qwen2-VL boundaries + <|video_pad|> mask).
RoleBoundary
processing_strategies.RoleBoundary(
role,
start_tokens,
end_tokens=list(),
include_start=False,
include_end=True,
)One role’s token-level span markers for the masking scanner.
Empty end_tokens means end-of-sequence terminates the span.
SmolVLM2ProcessingStrategy
processing_strategies.SmolVLM2ProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for SmolVLM2.
Role boundaries NOT declared — SmolVLM2 chat_template varies per checkpoint (HuggingFaceTB ships multiple variants), so we opt out rather than mis-mask.
VoxtralProcessingStrategy
processing_strategies.VoxtralProcessingStrategy(
processor,
chat_template=None,
image_size=None,
image_resize_algorithm=None,
train_on_inputs=False,
roles_to_train=None,
train_on_eos=None,
role_boundaries_override=None,
field_messages=None,
)Processing Strategy class for Voxtral.
Role boundaries NOT declared — mistral-common instruct tokenizer markers unverified. Falls back to pad+audio masking with a one-shot warning.