monkeypatch.accelerate.parallelism_config

monkeypatch.accelerate.parallelism_config

ParallelismConfig monkeypatch.

Two extensions: - Allow pure CP standalone via ACCELERATE_ALLOW_CP_STANDALONE. - Add Expert Parallel (ep) as a first-class mesh axis inside the data-parallel group. Mesh order is (ep, dp_replicate, dp_shard, cp, sp, tp) so the dp axes stay contiguous (required for _flatten("dp")).

See expert_parallel/README.md for the full integration story.

Functions

Name Description
patch_clip_grad_norm_for_ep Replace Accelerator.clip_grad_norm_ with the EP-aware version when
patch_prepare_data_loader_for_ep Apply the EP-aware data-loader patch.
patched_is_fsdp2 Patched version of is_fsdp2 that guards against a None fsdp_plugin.

patch_clip_grad_norm_for_ep

monkeypatch.accelerate.parallelism_config.patch_clip_grad_norm_for_ep()

Replace Accelerator.clip_grad_norm_ with the EP-aware version when the active parallelism includes both ep and dp_shard (i.e., the FSDP+EP composition produces multi-mesh DTensor grads).

patch_prepare_data_loader_for_ep

monkeypatch.accelerate.parallelism_config.patch_prepare_data_loader_for_ep()

Apply the EP-aware data-loader patch.

Idempotent: replacing the bound function more than once is harmless because the wrapper closes over the current prepare_data_loader.

patched_is_fsdp2

monkeypatch.accelerate.parallelism_config.patched_is_fsdp2(self)

Patched version of is_fsdp2 that guards against a None fsdp_plugin.