scripts.vllm_worker_ext

scripts.vllm_worker_ext

Extended vLLM worker extension with batch weight sync support.

Subclasses TRL’s WeightSyncWorkerExtension to add: - batch_update_named_params: receives multiple params in one call - Auto-close stale communicator on re-init - _direct_set_weight: proper handling for stacked (qkv_proj, gate_up_proj) params, including LoRA-wrapped models where vLLM inserts base_layer into the hierarchy

Classes

Name Description
BatchWeightSyncWorkerExtension Worker extension that adds batch weight update and direct weight setting.

BatchWeightSyncWorkerExtension

scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension()

Worker extension that adds batch weight update and direct weight setting.

Methods

Name Description
batch_update_named_params Receive and apply multiple weight tensors in sequence.
http_load_weight Load a single weight received via HTTP (no NCCL needed).
http_load_weights Load weights received via HTTP (no NCCL needed).
http_load_weights_batch Load multiple weights in a single IPC call.
init_communicator Auto-close stale communicator before re-initializing.
update_named_param Override to use _direct_set_weight instead of load_weights.
batch_update_named_params
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.batch_update_named_params(
    params_list,
)

Receive and apply multiple weight tensors in sequence.

Parameters
Name Type Description Default
params_list list[tuple[str, str, tuple]] List of (name, dtype_str, shape) tuples. required
http_load_weight
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.http_load_weight(
    **kwargs,
)

Load a single weight received via HTTP (no NCCL needed).

Reconstructs the tensor from raw bytes since tensors don’t survive vLLM’s multiproc IPC serialization. Uses vLLM’s load_weights which handles TP sharding and stacked-param packing automatically.

http_load_weights
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.http_load_weights(
    weights,
)

Load weights received via HTTP (no NCCL needed).

http_load_weights_batch
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.http_load_weights_batch(
    params,
)

Load multiple weights in a single IPC call.

Uses vLLM’s load_weights which handles TP sharding automatically.

init_communicator
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.init_communicator(
    host,
    port,
    world_size,
    client_device_uuid,
)

Auto-close stale communicator before re-initializing.

update_named_param
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.update_named_param(
    name,
    dtype,
    shape,
)

Override to use _direct_set_weight instead of load_weights.