Home
Getting Started
Quickstart
Installation
Inference and Merging
Command Line Interface (CLI)
Config Reference
API Reference
Dataset Formats
Pre-training
Instruction Tuning
Conversation
Stepwise Supervised Format
Template-Free
Custom Pre-Tokenized Dataset
Deployments
Docker
Multi-GPU
Multi Node
Ray Train
AMD GPUs on HPC Systems
Mac M-series
How To Guides
MultiModal / Vision Language Models (BETA)
RLHF (Beta)
Reward Modelling
Learning Rate Groups
LoRA Optimizations
Dataset Loading
Core Concepts
Batch size vs Gradient accumulation
Dataset Preprocessing
Multipack (Sample Packing)
Advanced Features
FDSP + QLoRA
Unsloth
PyTorch ao
Custom Integrations
Sequence Parallelism
Troubleshooting
FAQ
Debugging
NCCL
On this page
todo list
things that are known not to work
todo list
[] Validation of parameters for combinations that won’t work
things that are known not to work
FSDP offload and gradient_checkpointing - https://github.com/pytorch/pytorch/issues/82203
adamw_bnb_8bit doesn’t play well with FSDP offload