scripts.vllm_worker_ext
scripts.vllm_worker_ext
Extended vLLM worker extension with batch weight sync support.
Subclasses TRL’s WeightSyncWorkerExtension to add: - batch_update_named_params: receives multiple params in one call - Auto-close stale communicator on re-init - _direct_set_weight: proper handling for stacked (qkv_proj, gate_up_proj) params, including LoRA-wrapped models where vLLM inserts base_layer into the hierarchy
Classes
| Name | Description |
|---|---|
| BatchWeightSyncWorkerExtension | Worker extension that adds batch weight update and direct weight setting. |
BatchWeightSyncWorkerExtension
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension()Worker extension that adds batch weight update and direct weight setting.
Methods
| Name | Description |
|---|---|
| batch_update_named_params | Receive and apply multiple weight tensors in sequence. |
| http_load_weight | Load a single weight received via HTTP (no NCCL needed). |
| http_load_weights | Load weights received via HTTP (no NCCL needed). |
| http_load_weights_batch | Load multiple weights in a single IPC call. |
| init_communicator | Auto-close stale communicator before re-initializing. |
| update_named_param | Override to use _direct_set_weight instead of load_weights. |
batch_update_named_params
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.batch_update_named_params(
params_list,
)Receive and apply multiple weight tensors in sequence.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| params_list | list[tuple[str, str, tuple]] | List of (name, dtype_str, shape) tuples. | required |
http_load_weight
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.http_load_weight(
**kwargs,
)Load a single weight received via HTTP (no NCCL needed).
Reconstructs the tensor from raw bytes since tensors don’t survive
vLLM’s multiproc IPC serialization. Uses vLLM’s load_weights
which handles TP sharding and stacked-param packing automatically.
http_load_weights
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.http_load_weights(
weights,
)Load weights received via HTTP (no NCCL needed).
http_load_weights_batch
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.http_load_weights_batch(
params,
)Load multiple weights in a single IPC call.
Uses vLLM’s load_weights which handles TP sharding automatically.
init_communicator
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.init_communicator(
host,
port,
world_size,
client_device_uuid,
)Auto-close stale communicator before re-initializing.
update_named_param
scripts.vllm_worker_ext.BatchWeightSyncWorkerExtension.update_named_param(
name,
dtype,
shape,
)Override to use _direct_set_weight instead of load_weights.