Home
Getting Started
Quickstart
Installation
Inference and Merging
Command Line Interface (CLI)
Config Reference
API Reference
Dataset Formats
Pre-training
Instruction Tuning
Conversation
Stepwise Supervised Format
Template-Free
Custom Pre-Tokenized Dataset
Deployments
Docker
Multi-GPU
Multi Node
Ray Train
AMD GPUs on HPC Systems
Mac M-series
How To Guides
MultiModal / Vision Language Models (BETA)
RLHF (Beta)
Reward Modelling
Learning Rate Groups
LoRA Optimizations
Dataset Loading
Quantization Aware Training (QAT)
Quantization with torchao
Core Concepts
Batch size vs Gradient accumulation
Dataset Preprocessing
Multipack (Sample Packing)
Advanced Features
FDSP + QLoRA
Unsloth
PyTorch ao
Custom Integrations
Sequence Parallelism
Troubleshooting
FAQ
Debugging
NCCL
On this page
todo list
things that are known not to work
todo list
[] Validation of parameters for combinations that won’t work
things that are known not to work
FSDP offload and gradient_checkpointing - https://github.com/pytorch/pytorch/issues/82203
adamw_bnb_8bit doesn’t play well with FSDP offload