monkeypatch.tiled_mlp.patch
monkeypatch.tiled_mlp.patch
Monkeypatch for Tiled MLP implementation
Functions
| Name | Description |
|---|---|
| patch_tiled_mlp | Install the class-level tiled MLP patch. |
| patch_tiled_mlp_moe_instances | Re-wrap each MoE block instance’s forward after model load. |
patch_tiled_mlp
monkeypatch.tiled_mlp.patch.patch_tiled_mlp(
model_type,
use_original_mlp=True,
cfg_num_shards=None,
use_scattermoe=False,
)Install the class-level tiled MLP patch.
For dense models this patches {prefix}MLP (falling back to
{prefix}TextMLP for multimodal wrappers).
For MoE models with scattermoe-lora active, the MoE block class
({prefix}SparseMoeBlock / {prefix}MoeMLP / {prefix}MoE) is the
one whose forward does routing + expert invocation, so we patch that.
Note that the kernels library installs scattermoe-lora’s forward at
the instance level during model.kernelize(), so the class-level
patch is shadowed at runtime. :func:patch_tiled_mlp_moe_instances is
the companion post-model-load step that re-wraps each MoE block instance
so the tiled forward runs on top of the kernels-installed forward.
patch_tiled_mlp_moe_instances
monkeypatch.tiled_mlp.patch.patch_tiled_mlp_moe_instances(
model,
model_type,
cfg_num_shards=None,
)Re-wrap each MoE block instance’s forward after model load.
The kernels library installs scattermoe-lora’s forward on each MoE
block instance during model.kernelize() (called inside
from_pretrained). That instance-level binding shadows the class-level
patch :func:patch_tiled_mlp installs, so without this step tiling is
silently bypassed on every block. We capture each instance’s current
forward (the kernels-installed one) and rebind the instance to a tiled
forward that delegates to it.
Does nothing if no MoE block class exists for model_type or if
model contains no instances of it.