utils.quantization

utils.quantization

Utilities for quantization including QAT and PTQ using torchao.

Functions

Name Description
convert_qat_model This function converts a QAT model which has fake quantized layers back to the original model.
get_quantization_config This function is used to build a post-training quantization config.
prepare_model_for_qat This function is used to prepare a model for QAT by swapping the model’s linear
quantize_model This function is used to quantize a model.

convert_qat_model

utils.quantization.convert_qat_model(model, quantize_embedding=False)

This function converts a QAT model which has fake quantized layers back to the original model.

get_quantization_config

utils.quantization.get_quantization_config(
    weight_dtype,
    activation_dtype=None,
    group_size=None,
)

This function is used to build a post-training quantization config.

Parameters

Name Type Description Default
weight_dtype TorchAOQuantDType The dtype to use for weight quantization. required
activation_dtype TorchAOQuantDType | None The dtype to use for activation quantization. None
group_size int | None The group size to use for weight quantization. None

Returns

Name Type Description
AOBaseConfig The post-training quantization config.

Raises

Name Type Description
ValueError If the activation dtype is not specified and the weight dtype is not int8 or int4, or if the group size is not specified for int8 or int4 weight only quantization.

prepare_model_for_qat

utils.quantization.prepare_model_for_qat(
    model,
    weight_dtype,
    group_size=None,
    activation_dtype=None,
    quantize_embedding=False,
)

This function is used to prepare a model for QAT by swapping the model’s linear layers with fake quantized linear layers, and optionally the embedding weights with fake quantized embedding weights.

Parameters

Name Type Description Default
model The model to quantize. required
weight_dtype TorchAOQuantDType The dtype to use for weight quantization. required
group_size int | None The group size to use for weight quantization. None
activation_dtype TorchAOQuantDType | None The dtype to use for activation quantization. None
quantize_embedding bool Whether to quantize the model’s embedding weights. False

Raises

Name Type Description
ValueError If the activation/weight dtype combination is invalid.

quantize_model

utils.quantization.quantize_model(
    model,
    weight_dtype,
    group_size=None,
    activation_dtype=None,
    quantize_embedding=None,
)

This function is used to quantize a model.

Parameters

Name Type Description Default
model The model to quantize. required
weight_dtype TorchAOQuantDType The dtype to use for weight quantization. required
group_size int | None The group size to use for weight quantization. None
activation_dtype TorchAOQuantDType | None The dtype to use for activation quantization. None
quantize_embedding bool | None Whether to quantize the model’s embedding weights. None