integrations.expert_parallel.buffer

integrations.expert_parallel.buffer

DeepEP Buffer singleton, lazily constructed on first call.

A single Buffer is reused across all MoE layers in a model, since DeepEP’s intranode kernels are sized by num_nvl_bytes which we set conservatively at plugin init. Per-layer Buffer construction would burn memory.

Functions

Name Description
configure_buffer Stash params for lazy Buffer construction. Call from post_model_build.
get_buffer Return the (lazily constructed) DeepEP Buffer.
reset_buffer Drop the cached Buffer. Used in tests.

configure_buffer

integrations.expert_parallel.buffer.configure_buffer(
    ep_group,
    num_nvl_bytes=256 << 20,
    num_rdma_bytes=0,
)

Stash params for lazy Buffer construction. Call from post_model_build.

get_buffer

integrations.expert_parallel.buffer.get_buffer()

Return the (lazily constructed) DeepEP Buffer.

reset_buffer

integrations.expert_parallel.buffer.reset_buffer()

Drop the cached Buffer. Used in tests.