scripts.vllm_serve_lora

scripts.vllm_serve_lora

vLLM serve script with native LoRA adapter support.

Extends TRL’s vllm_serve to enable direct LoRA adapter loading in vLLM, instead of merging adapter weights into the base model before syncing.

Usage

Set vllm.serve_module: axolotl.scripts.vllm_serve_lora in your config, or trl.vllm_lora_sync: true to auto-select.

Benefits over merge-sync

  • Syncs only LoRA adapter weights via filesystem instead of full merged model via NCCL
  • vLLM handles LoRA application natively (Punica kernels)
  • No NCCL communicator needed for weight sync

Classes

Name Description
LoRAScriptArguments Extended script arguments with LoRA support.

LoRAScriptArguments

scripts.vllm_serve_lora.LoRAScriptArguments(
    enable_lora=True,
    max_lora_rank=64,
    max_loras=2,
    lora_dtype='bfloat16',
    worker_extension_cls='trl.scripts.vllm_serve.WeightSyncWorkerExtension',
)

Extended script arguments with LoRA support.

Functions

Name Description
llm_worker Worker process that creates a vLLM LLM with LoRA enabled.
main Start vLLM workers with LoRA support and the HTTP server.

llm_worker

scripts.vllm_serve_lora.llm_worker(
    script_args,
    data_parallel_rank,
    master_port,
    connection,
)

Worker process that creates a vLLM LLM with LoRA enabled.

main

scripts.vllm_serve_lora.main(script_args)

Start vLLM workers with LoRA support and the HTTP server.