monkeypatch.tiled_mlp.patch

monkeypatch.tiled_mlp.patch

Monkeypatch for Tiled MLP implementation

Functions

Name Description
patch_tiled_mlp Install the class-level tiled MLP patch.
patch_tiled_mlp_moe_instances Re-wrap each MoE block instance’s forward after model load.

patch_tiled_mlp

monkeypatch.tiled_mlp.patch.patch_tiled_mlp(
    model_type,
    use_original_mlp=True,
    cfg_num_shards=None,
    use_scattermoe=False,
)

Install the class-level tiled MLP patch.

For dense models this patches {prefix}MLP (falling back to {prefix}TextMLP for multimodal wrappers).

For MoE models with scattermoe-lora active, the MoE block class ({prefix}SparseMoeBlock / {prefix}MoeMLP / {prefix}MoE) is the one whose forward does routing + expert invocation, so we patch that. Note that the kernels library installs scattermoe-lora’s forward at the instance level during model.kernelize(), so the class-level patch is shadowed at runtime. :func:patch_tiled_mlp_moe_instances is the companion post-model-load step that re-wraps each MoE block instance so the tiled forward runs on top of the kernels-installed forward.

patch_tiled_mlp_moe_instances

monkeypatch.tiled_mlp.patch.patch_tiled_mlp_moe_instances(
    model,
    model_type,
    cfg_num_shards=None,
)

Re-wrap each MoE block instance’s forward after model load.

The kernels library installs scattermoe-lora’s forward on each MoE block instance during model.kernelize() (called inside from_pretrained). That instance-level binding shadows the class-level patch :func:patch_tiled_mlp installs, so without this step tiling is silently bypassed on every block. We capture each instance’s current forward (the kernels-installed one) and rebind the instance to a tiled forward that delegates to it.

Does nothing if no MoE block class exists for model_type or if model contains no instances of it.