utils.quantization

utils.quantization

Utilities for quantization including QAT and PTQ using torchao.

Functions

Name Description
convert_qat_model_for_ptq This function is used to convert a swap fake-quantized modules in a model
get_ptq_config This function is used to build a post-training quantization config.
prepare_model_for_qat This function is used to prepare a model for QAT by swapping the model’s linear
quantize_model_for_ptq This function is used to quantize a model for post-training quantization.

convert_qat_model_for_ptq

utils.quantization.convert_qat_model_for_ptq(model, *, quantize_embedding=None)

This function is used to convert a swap fake-quantized modules in a model which has been trained with QAT back to the original modules, ready for PTQ.

Parameters

Name Type Description Default
model The model to convert. required
quantize_embedding bool | None Whether to quantize the model’s embedding weights. None

get_ptq_config

utils.quantization.get_ptq_config(
    weight_dtype,
    activation_dtype=None,
    group_size=None,
)

This function is used to build a post-training quantization config.

Parameters

Name Type Description Default
weight_dtype TorchIntDType The dtype to use for weight quantization. required
activation_dtype TorchIntDType | None The dtype to use for activation quantization. None
group_size int | None The group size to use for weight quantization. None

Returns

Name Type Description
AOBaseConfig The post-training quantization config.

Raises

Name Type Description
ValueError If the activation dtype is not specified and the weight dtype is not int8 or int4, or if the group size is not specified for int8 or int4 weight only quantization.

prepare_model_for_qat

utils.quantization.prepare_model_for_qat(
    model,
    weight_dtype,
    group_size,
    activation_dtype=None,
    quantize_embedding=False,
)

This function is used to prepare a model for QAT by swapping the model’s linear layers with fake quantized linear layers, and optionally the embedding weights with fake quantized embedding weights.

Parameters

Name Type Description Default
model The model to quantize. required
weight_dtype TorchIntDType The dtype to use for weight quantization. required
group_size int The group size to use for weight quantization. required
activation_dtype TorchIntDType | None The dtype to use for activation quantization. None
quantize_embedding bool Whether to quantize the model’s embedding weights. False

Raises

Name Type Description
ValueError If the activation/weight dtype combination is invalid.

quantize_model_for_ptq

utils.quantization.quantize_model_for_ptq(
    model,
    weight_dtype,
    group_size=None,
    activation_dtype=None,
    quantize_embedding=None,
)

This function is used to quantize a model for post-training quantization. It swaps the model’s linear layers with fake quantized linear layers. If quantize_embedding is True, it will also swap the model’s embedding weights with fake quantized embedding weights.

Parameters

Name Type Description Default
model The model to quantize. required
weight_dtype TorchIntDType The dtype to use for weight quantization. required
group_size int | None The group size to use for weight quantization. None
activation_dtype TorchIntDType | None The dtype to use for activation quantization. None
quantize_embedding bool | None Whether to quantize the model’s embedding weights. None