monkeypatch.moe_quant

monkeypatch.moe_quant

Loading-time quantization for MoE expert weights stored as 3D nn.Parameter tensors.

Classes

Name Description
Bnb8bitParametrization Dequantizes int8 row-wise quantized data on access.

Bnb8bitParametrization

monkeypatch.moe_quant.Bnb8bitParametrization(row_stats)

Dequantizes int8 row-wise quantized data on access.

Methods

Name Description
forward Flatten 3D+ to 2D for BnB’s dequant, then reshape back.
forward
monkeypatch.moe_quant.Bnb8bitParametrization.forward(quantized_param)

Flatten 3D+ to 2D for BnB’s dequant, then reshape back.

Functions

Name Description
get_moe_quantized_count Return the number of expert parameters quantized during loading.
patch_moe_quantization_on_load Patch transformers’ weight loading to quantize MoE expert params on-the-fly.
patch_peft_target_parameters_matching Fix PEFT’s _inject_parameters for target_parameters on quantized MoE experts.
replace_parameter_8bit Replace a module parameter with an 8-bit quantized version using parametrization.

get_moe_quantized_count

monkeypatch.moe_quant.get_moe_quantized_count()

Return the number of expert parameters quantized during loading.

patch_moe_quantization_on_load

monkeypatch.moe_quant.patch_moe_quantization_on_load(cfg)

Patch transformers’ weight loading to quantize MoE expert params on-the-fly.

patch_peft_target_parameters_matching

monkeypatch.moe_quant.patch_peft_target_parameters_matching()

Fix PEFT’s _inject_parameters for target_parameters on quantized MoE experts.

  1. Expands short suffixes to full module paths for parametrized modules.
  2. Iterates params in definition order (not alphabetical order) so saved adapters are compatible with standard PEFT, vLLM, etc.
  3. Skips ParametrizationList synthetic paths to prevent PEFT from mistakenly targeting quantized expert params via name-suffix matching.

replace_parameter_8bit

monkeypatch.moe_quant.replace_parameter_8bit(module, param_name)

Replace a module parameter with an 8-bit quantized version using parametrization.