utils.weight_serde
utils.weight_serde
Serialize / deserialize tensors for HTTP and IPC weight sync.
NumPy doesn’t support bfloat16, so bf16 tensors are cast to fp16 on the wire and reconstructed at the destination. All encode/decode helpers live here so the logic isn’t duplicated across trl_vllm.py, vllm_serve_lora.py, and vllm_worker_ext.py.
Functions
| Name | Description |
|---|---|
| decode_from_http | Decode an HTTP-encoded weight entry back to a named tensor. |
| decode_from_ipc | Decode an IPC-encoded weight entry back to a named tensor. |
| encode_for_http | Encode a named parameter for JSON transport over HTTP. |
| encode_for_ipc | Encode a tensor for vLLM’s multiproc IPC (raw bytes, no base64). |
decode_from_http
utils.weight_serde.decode_from_http(entry)Decode an HTTP-encoded weight entry back to a named tensor.
Infers wire dtype from byte count (bf16 arrives as fp16) and casts to the
original dtype stored in entry["dtype"].
decode_from_ipc
utils.weight_serde.decode_from_ipc(entry)Decode an IPC-encoded weight entry back to a named tensor.
Handles optional target_dtype for backward compatibility with older
serve code that may not include it.
encode_for_http
utils.weight_serde.encode_for_http(name, weight)Encode a named parameter for JSON transport over HTTP.
Returns a dict with keys: name, dtype (original), shape, data (base64).
bf16 tensors are sent as fp16 bytes; the original dtype is preserved in
the dtype field so the receiver can cast back.
encode_for_ipc
utils.weight_serde.encode_for_ipc(name, weight)Encode a tensor for vLLM’s multiproc IPC (raw bytes, no base64).
Returns a dict with keys: name, data (bytes), dtype (wire), target_dtype (original), shape. bf16 tensors are serialized as fp16.