MiMo

MiMo is a family of models trained from scratch for reasoning tasks, incorporating Multiple-Token Prediction (MTP) as an additional training objective for enhanced performance and faster inference. Pre-trained on ~25T tokens with a three-stage data mixture strategy and optimized reasoning pattern density.

This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.

Getting started

  1. Install Axolotl following the installation guide.

  2. Run the finetuning example:

    axolotl train examples/mimo/mimo-7b-qlora.yaml

This config uses about 17.2 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀

Tips

  • You can run a full finetuning by removing the adapter: qlora and load_in_4bit: true from the config.
  • Read more on how to load your own dataset at docs.
  • The dataset format follows the OpenAI Messages format as seen here.

Optimization Guides

Please check the Optimizations doc.

Limitations

Cut Cross Entropy (CCE): Currently not supported. We plan to include CCE support for MiMo in the near future.