scripts.vllm_serve_lora
scripts.vllm_serve_lora
vLLM serve script with native LoRA adapter support.
Extends TRL’s vllm_serve to enable direct LoRA adapter loading in vLLM, instead of merging adapter weights into the base model before syncing.
Usage
Set vllm.serve_module: axolotl.scripts.vllm_serve_lora in your config,
or trl.vllm_lora_sync: true to auto-select.
Benefits over merge-sync
- Syncs only LoRA adapter weights via filesystem instead of full merged model via NCCL
- vLLM handles LoRA application natively (Punica kernels)
- No NCCL communicator needed for weight sync
Classes
| Name | Description |
|---|---|
| LoRAScriptArguments | Extended script arguments with LoRA support. |
LoRAScriptArguments
scripts.vllm_serve_lora.LoRAScriptArguments(
enable_lora=True,
max_lora_rank=64,
max_loras=2,
lora_dtype='bfloat16',
worker_extension_cls='trl.scripts.vllm_serve.WeightSyncWorkerExtension',
)Extended script arguments with LoRA support.
Functions
| Name | Description |
|---|---|
| llm_worker | Worker process that creates a vLLM LLM with LoRA enabled. |
| main | Start vLLM workers with LoRA support and the HTTP server. |
llm_worker
scripts.vllm_serve_lora.llm_worker(
script_args,
data_parallel_rank,
master_port,
connection,
)Worker process that creates a vLLM LLM with LoRA enabled.
main
scripts.vllm_serve_lora.main(script_args)Start vLLM workers with LoRA support and the HTTP server.