cli.utils.lora_merge

cli.utils.lora_merge

Functions

Name	Description
copy_non_model_files	Copy all non-model files to the output directory.
find_lora_weights	Find corresponding LoRA A and B weights for a given key.
get_model_shards	Find all model shards in the given path.
merge_lora_sharded_efficient	Memory-efficient LoRA merging that processes shards individually

copy_non_model_files

cli.utils.lora_merge.copy_non_model_files(input_path, output_path, model_shards)

Copy all non-model files to the output directory.

Parameters

Name	Type	Description	Default
input_path	Path	Source directory	required
output_path	Path	Destination directory	required
model_shards	list[Path]	List of model shard files to skip	required

find_lora_weights

cli.utils.lora_merge.find_lora_weights(lora_state, key, weight_renamings=None)

Find corresponding LoRA A and B weights for a given key.

Also tries keys after applying weight renamings (from transformers v5 conversion mappings) in case the checkpoint key names differ from the runtime model key names used by the LoRA adapter.

get_model_shards

cli.utils.lora_merge.get_model_shards(model_path)

Find all model shards in the given path.

merge_lora_sharded_efficient

cli.utils.lora_merge.merge_lora_sharded_efficient(
    base_model_path,
    lora_adapter_path,
    output_path,
    device='cpu',
    safe_tensors=True,
    simulate_nf4=False,
    simulate_nf4_experts=False,
    nf4_blocksize=None,
    nf4_double_quant=True,
    trust_remote_code=False,
    dequant=False,
)

Memory-efficient LoRA merging that processes shards individually without loading the full model into memory.

if True, dequantize every quantized weight and write a bf16 checkpoint (strips

quantization_config). Default False = FORMAT-PRESERVING: LoRA-targeted quantized weights are dequantized, the delta folded, then re-quantized back to the SAME format (fp8 stays fp8, nvfp4 stays nvfp4), so a large quantized base does not double in size.

Parameters

Name	Type	Description	Default
simulate_nf4	bool	Apply NF4 roundtrip to ALL weight tensors (for QLoRA)	`False`
simulate_nf4_experts	bool	Apply NF4 roundtrip only to MoE expert tensors (for quantize_moe_experts). Expert tensors are identified by having “expert” in the key name and ndim >= 3.	`False`
trust_remote_code	bool	Whether to trust remote code when loading model config for layer-type introspection. Defaults to False for safety.	`False`