utils.weight_serde

utils.weight_serde

Serialize / deserialize tensors for HTTP and IPC weight sync.

NumPy doesn’t support bfloat16, so bf16 tensors are cast to fp16 on the wire and reconstructed at the destination. All encode/decode helpers live here so the logic isn’t duplicated across trl_vllm.py, vllm_serve_lora.py, and vllm_worker_ext.py.

Functions

Name Description
decode_from_http Decode an HTTP-encoded weight entry back to a named tensor.
decode_from_ipc Decode an IPC-encoded weight entry back to a named tensor.
encode_for_http Encode a named parameter for JSON transport over HTTP.
encode_for_ipc Encode a tensor for vLLM’s multiproc IPC (raw bytes, no base64).

decode_from_http

utils.weight_serde.decode_from_http(entry)

Decode an HTTP-encoded weight entry back to a named tensor.

Infers wire dtype from byte count (bf16 arrives as fp16) and casts to the original dtype stored in entry["dtype"].

decode_from_ipc

utils.weight_serde.decode_from_ipc(entry)

Decode an IPC-encoded weight entry back to a named tensor.

Handles optional target_dtype for backward compatibility with older serve code that may not include it.

encode_for_http

utils.weight_serde.encode_for_http(name, weight)

Encode a named parameter for JSON transport over HTTP.

Returns a dict with keys: name, dtype (original), shape, data (base64). bf16 tensors are sent as fp16 bytes; the original dtype is preserved in the dtype field so the receiver can cast back.

encode_for_ipc

utils.weight_serde.encode_for_ipc(name, weight)

Encode a tensor for vLLM’s multiproc IPC (raw bytes, no base64).

Returns a dict with keys: name, data (bytes), dtype (wire), target_dtype (original), shape. bf16 tensors are serialized as fp16.