integrations.expert_parallel.buffer
integrations.expert_parallel.buffer
DeepEP Buffer singleton, lazily constructed on first call.
A single Buffer is reused across all MoE layers in a model, since DeepEP’s
intranode kernels are sized by num_nvl_bytes which we set conservatively at
plugin init. Per-layer Buffer construction would burn memory.
Functions
| Name | Description |
|---|---|
| configure_buffer | Stash params for lazy Buffer construction. Call from post_model_build. |
| get_buffer | Return the (lazily constructed) DeepEP Buffer. |
| reset_buffer | Drop the cached Buffer. Used in tests. |
configure_buffer
integrations.expert_parallel.buffer.configure_buffer(
ep_group,
num_nvl_bytes=256 << 20,
num_rdma_bytes=0,
)Stash params for lazy Buffer construction. Call from post_model_build.
get_buffer
integrations.expert_parallel.buffer.get_buffer()Return the (lazily constructed) DeepEP Buffer.
reset_buffer
integrations.expert_parallel.buffer.reset_buffer()Drop the cached Buffer. Used in tests.