monkeypatch.tiled_mlp.base

monkeypatch.tiled_mlp.base

TiledMLP support for DDP, FSDP, and single GPU

Classes

Name Description
GradientAccumulator Manual gradient accumulator for TiledMLP with configurable precision.
TiledMLP TiledMLP implementation using gradient hooks

GradientAccumulator

monkeypatch.tiled_mlp.base.GradientAccumulator(params, total_shards, dtype=None)

Manual gradient accumulator for TiledMLP with configurable precision.

.. note:: The production TiledMLP backward (above) accumulates inline and does not call this class — it is retained as a reference / opt-in path for callers that want hook-based accumulation. The defaults below match the inline path: param-dtype accumulator (matches AccumulateGrad in the unsharded backward) and 1.0 per-shard scaling (sequence-dim sharded grads are additive, not averaged).

Methods

Name Description
cleanup Remove all installed hooks
install_hooks Install gradient hooks that accumulate gradients in higher precision
cleanup
monkeypatch.tiled_mlp.base.GradientAccumulator.cleanup()

Remove all installed hooks

install_hooks
monkeypatch.tiled_mlp.base.GradientAccumulator.install_hooks(is_last_shard)

Install gradient hooks that accumulate gradients in higher precision

TiledMLP

monkeypatch.tiled_mlp.base.TiledMLP()

TiledMLP implementation using gradient hooks