Mistral Medium 3.5
Mistral Medium 3.5 is a 128B parameter dense multimodal model from MistralAI that unifies instruct, reasoning, and agentic capabilities into a single model.
It shares the mistral3 architecture (dense, YaRN RoPE, 256k context) with Ministral 3 and supports the same reasoning_effort toggle as Mistral Small 4.
Thanks to the team at MistralAI for giving us early access to prepare for this release.
Getting started
Install Axolotl following the installation guide.
Install Cut Cross Entropy to reduce training VRAM usage.
(Text config only) Install Flash Attention 4 on Hopper/Blackwell.
Run one of the example configs:
# text-only axolotl train examples/mistral-medium-3_5/qlora-text.yml # ~83.1 GiB # text + vision # wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg axolotl train examples/mistral-medium-3_5/qlora-vision.yml # ~80.3 GiB
Note: vision training does not currently work with Flash Attention 4.
Reasoning Effort
The chat template supports a reasoning_effort variable to control the model’s reasoning depth:
"none"— instruct mode (default)"high"— reasoning mode with explicit thinking steps
Pass it via chat_template_kwargs under your dataset config:
datasets:
- path: your/dataset
type: chat_template
chat_template_kwargs:
reasoning_effort: highThinking Support
The chat template supports a thinking content type in assistant messages for training on reasoning traces (rendered as [THINK]...[/THINK] blocks).
To use thinking datasets, add the thinking mapping via message_property_mappings:
datasets:
- path: your/thinking-dataset
type: chat_template
message_property_mappings:
role: role
content: content
thinking: thinking
chat_template_kwargs:
reasoning_effort: highSee the Magistral thinking guide for dataset format details.
Tips
- For smaller experiments on the same architecture, see
examples/ministral3(Ministral 3, 3B). - Read more on how to load your own dataset at docs.
- The text dataset format follows the OpenAI Messages format as seen here.
- The vision model requires multi-modal dataset format as documented here.