1.58-bit Finetuning
Overview
1.58-bit finetuning allows you to finetune BitNet models when their prequantized weights are provided. In theory, it will be possible to fine-tune any LLM in 1.58bit format but the performance degradation will be dramatic.
Axolotl supports 1.58-bit finetuning via the onebitllms library, which replaces standard linear layers with BitNet-compatible counterparts ready to use for training.
LoRA is not supported for BitNet models
Installation
Install the onebitllms package before using this feature:
uv pip install onebitllmsOr from source:
uv pip install git+https://github.com/tiiuae/onebitllmsSupported models
For now, only Falcon-E series of models are supported. Make sure to use their -prequantized version:
tiiuae/Falcon-E-3B-Base-prequantized
tiiuae/Falcon-E-1B-Base-prequantizedIn theory, any other model would ‘work’ but the performance degradation will be huge. This remains an area of exploration.
Configuration
To enable 1.58-bit finetuning, set the following in your configuration file:
base_model: tiiuae/Falcon-E-3B-Base-prequantized # A BitNet-compatible model
use_onebitllms: trueFor BitNet models, it is recommended to use a higher learning rate than classic models (usually in the order of magnitude of 10x).
Considerations after training
Once your model has been trained with 1.58bit fine-tuning, you can convert the trained model in ternary format using the onebitllms CLI:
onebitllms quantize_to_1bit INPUT_PATH OUTPUT_PATHAfter that, you can use supported packages such as llama.cpp or Apple MLX package to run the trained model.
Example Configuration
You can find example configurations in examples/falcon-e which contain one configuration for SFT and one configuration for DPO.