scripts.vllm_worker_ext

scripts.vllm_worker_ext

Extended vLLM worker extension with batch weight sync support.

Subclasses TRL’s WeightSyncWorkerExtension to add: - batch_update_named_params: receives multiple params in one call - Auto-close stale communicator on re-init - _direct_set_weight: proper handling for stacked (qkv_proj, gate_up_proj) params, including LoRA-wrapped models where vLLM inserts base_layer into the hierarchy

Classes

Name	Description
BatchWeightSyncWorkerExtension	Worker extension that adds batch weight update and direct weight setting.

BatchWeightSyncWorkerExtension

scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension()

Worker extension that adds batch weight update and direct weight setting.

Methods

Name	Description
batch_update_named_params	Receive and apply multiple weight tensors in sequence.
http_load_weight	Load a single weight received via HTTP (no NCCL needed).
http_load_weights	Load weights received via HTTP (no NCCL needed).
http_load_weights_batch	Load multiple weights in a single IPC call.
init_communicator	Auto-close stale communicator before re-initializing.
update_named_param	Override to use _direct_set_weight instead of load_weights.

batch_update_named_params

scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.batch_update_named_params(
    params_list,
)

Receive and apply multiple weight tensors in sequence.

Parameters

Name	Type	Description	Default
params_list	list[tuple[str, str, tuple]]	List of (name, dtype_str, shape) tuples.	required

http_load_weight

scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.http_load_weight(
    **kwargs,
)

Load a single weight received via HTTP (no NCCL needed).

Reconstructs the tensor from raw bytes since tensors don’t survive vLLM’s multiproc IPC serialization. Uses vLLM’s load_weights which handles TP sharding and stacked-param packing automatically.

http_load_weights

scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.http_load_weights(
    weights,
)

Load weights received via HTTP (no NCCL needed).

http_load_weights_batch

scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.http_load_weights_batch(
    params,
)

Load multiple weights in a single IPC call.

Uses vLLM’s load_weights which handles TP sharding automatically.

init_communicator

scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.init_communicator(
    host,
    port,
    world_size,
    client_device_uuid,
)

Auto-close stale communicator before re-initializing.

update_named_param

scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.update_named_param(
    name,
    dtype,
    shape,
)

Override to use _direct_set_weight instead of load_weights.