monkeypatch.tiled_mlp.base
monkeypatch.tiled_mlp.base
TiledMLP support for DDP, FSDP, and single GPU
Classes
| Name | Description |
|---|---|
| GradientAccumulator | Manual gradient accumulator for TiledMLP with configurable precision. |
| TiledMLP | TiledMLP implementation using gradient hooks |
GradientAccumulator
monkeypatch.tiled_mlp.base.GradientAccumulator(params, total_shards, dtype=None)Manual gradient accumulator for TiledMLP with configurable precision.
.. note::
The production TiledMLP backward (above) accumulates inline and
does not call this class — it is retained as a reference / opt-in
path for callers that want hook-based accumulation. The defaults
below match the inline path: param-dtype accumulator (matches
AccumulateGrad in the unsharded backward) and 1.0 per-shard
scaling (sequence-dim sharded grads are additive, not averaged).
Methods
| Name | Description |
|---|---|
| cleanup | Remove all installed hooks |
| install_hooks | Install gradient hooks that accumulate gradients in higher precision |
cleanup
monkeypatch.tiled_mlp.base.GradientAccumulator.cleanup()Remove all installed hooks
install_hooks
monkeypatch.tiled_mlp.base.GradientAccumulator.install_hooks(is_last_shard)Install gradient hooks that accumulate gradients in higher precision
TiledMLP
monkeypatch.tiled_mlp.base.TiledMLP()TiledMLP implementation using gradient hooks