monkeypatch.moe_quant

monkeypatch.moe_quant

Loading-time quantization for MoE expert weights stored as 3D nn.Parameter tensors.

Classes

Name	Description
Bnb8bitParametrization	Dequantizes int8 row-wise quantized data on access.

monkeypatch.moe_quant.Bnb8bitParametrization(row_stats)

Dequantizes int8 row-wise quantized data on access.

Name	Description
forward	Flatten 3D+ to 2D for BnB’s dequant, then reshape back.

monkeypatch.moe_quant.Bnb8bitParametrization.forward(quantized_param)

Flatten 3D+ to 2D for BnB’s dequant, then reshape back.

Name	Description
get_moe_quantized_count	Return the number of expert parameters quantized during loading.
patch_moe_quantization_on_load	Patch transformers’ weight loading to quantize MoE expert params on-the-fly.
patch_peft_target_parameters_matching	Fix PEFT’s _inject_parameters for target_parameters on quantized MoE experts.
replace_parameter_8bit	Replace a module parameter with an 8-bit quantized version using parametrization.

monkeypatch.moe_quant.get_moe_quantized_count()

Return the number of expert parameters quantized during loading.

monkeypatch.moe_quant.patch_moe_quantization_on_load(cfg)

Patch transformers’ weight loading to quantize MoE expert params on-the-fly.

monkeypatch.moe_quant.patch_peft_target_parameters_matching()

Fix PEFT’s _inject_parameters for target_parameters on quantized MoE experts.

Expands short suffixes to full module paths for parametrized modules.
Iterates params in definition order (not alphabetical order) so saved adapters are compatible with standard PEFT, vLLM, etc.
Skips ParametrizationList synthetic paths to prevent PEFT from mistakenly targeting quantized expert params via name-suffix matching.

monkeypatch.moe_quant.replace_parameter_8bit(module, param_name)

Replace a module parameter with an 8-bit quantized version using parametrization.