• Home
    • Getting Started
      • Quickstart
      • Installation
      • Inference and Merging
      • Model Guides
        • Kimi Linear
        • Plano Orchestrator
        • MiMo
        • InternVL 3.5
        • OLMo 3
        • Trinity
        • Arcee AFM
        • Ministral3
          • Ministral3
          • Ministral 3 Thinking
          • Ministral 3 Vision
        • Magistral
          • Magistral
          • Magistral Thinking
          • Magistral Vision
        • Ministral
        • Mistral Small 3.1/3.2
        • Voxtral
        • Devstral
        • Mistral 7B
        • Llama 4
        • Llama 2
        • Qwen 3 Next
        • Qwen 3
        • Gemma 3n
        • Apertus
        • GPT-OSS
        • Seed-OSS
        • Phi
        • SmolVLM 2
        • Granite 4
        • Liquid Foundation Models 2
        • Hunyuan
        • Jamba
        • Orpheus
      • Command Line Interface (CLI)
      • Telemetry
      • Config Reference
      • API Reference
    • Dataset Formats
      • Pre-training
      • Instruction Tuning
      • Conversation
      • Stepwise Supervised Format
      • Template-Free
      • Custom Pre-Tokenized Dataset
    • Deployments
      • Docker
      • Multi-GPU
      • Multi Node
      • Ray Train
      • AMD GPUs on HPC Systems
      • Mac M-series
    • How To Guides
      • MultiModal / Vision Language Models (BETA)
      • RLHF (Beta)
      • Reward Modelling
      • Learning Rate Groups
      • LoRA Optimizations
      • Dataset Loading
      • Quantization Aware Training (QAT)
      • Quantization with torchao
      • Optimizations Guide
    • Core Concepts
      • Batch size vs Gradient accumulation
      • Dataset Preprocessing
      • Streaming Datasets
      • Multipack (Sample Packing)
      • Mixed Precision Training
      • Optimizers
      • Attention
    • Advanced Features
      • FSDP + QLoRA
      • Unsloth
      • PyTorch ao
      • Custom Integrations
      • Sequence Parallelism
      • Gradient Checkpointing and Activation Offloading
      • N-D Parallelism (Beta)
      • MoE Expert Quantization
    • Troubleshooting
      • FAQ
      • Debugging
      • NCCL