scripts.vllm_serve_lora

scripts.vllm_serve_lora

vLLM serve script with native LoRA adapter support.

Extends TRL’s vllm_serve to enable direct LoRA adapter loading in vLLM, instead of merging adapter weights into the base model before syncing.

Usage

Set vllm.serve_module: axolotl.scripts.vllm_serve_lora in your config, or trl.vllm_lora_sync: true to auto-select.

Benefits over merge-sync

Syncs only LoRA adapter weights via filesystem instead of full merged model via NCCL
vLLM handles LoRA application natively (Punica kernels)
No NCCL communicator needed for weight sync

Classes

Name	Description
LoRAScriptArguments	Extended script arguments with LoRA support.

LoRAScriptArguments

scripts.vllm_serve_lora.LoRAScriptArguments(
    enable_lora=True,
    max_lora_rank=64,
    max_loras=2,
    lora_dtype='bfloat16',
    worker_extension_cls='trl.scripts.vllm_serve.WeightSyncWorkerExtension',
)

Extended script arguments with LoRA support.

Functions

Name	Description
llm_worker	Worker process that creates a vLLM LLM with LoRA enabled.
main	Start vLLM workers with LoRA support and the HTTP server.

llm_worker

scripts.vllm_serve_lora.llm_worker(
    script_args,
    data_parallel_rank,
    master_port,
    connection,
)

Worker process that creates a vLLM LLM with LoRA enabled.

main

scripts.vllm_serve_lora.main(script_args)

Start vLLM workers with LoRA support and the HTTP server.