monkeypatch.accelerate.parallelism_config

monkeypatch.accelerate.parallelism_config

ParallelismConfig monkeypatch.

Two extensions: - Allow pure CP standalone via ACCELERATE_ALLOW_CP_STANDALONE. - Add Expert Parallel (ep) as a first-class mesh axis inside the data-parallel group. Mesh order is (ep, dp_replicate, dp_shard, cp, sp, tp) so the dp axes stay contiguous (required for _flatten("dp")).

See expert_parallel/README.md for the full integration story.

Functions

Name	Description
patch_clip_grad_norm_for_ep	Replace `Accelerator.clip_grad_norm_` with the EP-aware version when
patch_prepare_data_loader_for_ep	Apply the EP-aware data-loader patch.
patched_is_fsdp2	Patched version of is_fsdp2 that guards against a None fsdp_plugin.

patch_clip_grad_norm_for_ep

monkeypatch.accelerate.parallelism_config.patch_clip_grad_norm_for_ep()

Replace Accelerator.clip_grad_norm_ with the EP-aware version when the active parallelism composes ep with dp_shard and/or cp (i.e., the FSDP+EP composition produces multi-mesh DTensor grads — the experts shard on the dp_shard/cp subgroup, the non-experts on the flattened dp_shard_cp mesh, so the stock clip_grad_norm_ can’t stack their per-param norms together).

patch_prepare_data_loader_for_ep

monkeypatch.accelerate.parallelism_config.patch_prepare_data_loader_for_ep()

Apply the EP-aware data-loader patch.

Idempotent: replacing the bound function more than once is harmless because the wrapper closes over the current prepare_data_loader.

patched_is_fsdp2

monkeypatch.accelerate.parallelism_config.patched_is_fsdp2(self)

Patched version of is_fsdp2 that guards against a None fsdp_plugin.