monkeypatch.deepspeed_utils

monkeypatch.deepspeed_utils

Functions

Name Description
apply_deepspeed_patches Apply DeepSpeed-related patches
patch_checkpoint_wrapper_setattr Patch CheckpointWrapper to properly forward DeepSpeed attributes to wrapped modules.

apply_deepspeed_patches

monkeypatch.deepspeed_utils.apply_deepspeed_patches()

Apply DeepSpeed-related patches

patch_checkpoint_wrapper_setattr

monkeypatch.deepspeed_utils.patch_checkpoint_wrapper_setattr()

Patch CheckpointWrapper to properly forward DeepSpeed attributes to wrapped modules.

This fixes the issue where CheckpointWrapper doesn’t forward ds_* attributes (like ds_grads_remaining) to the actual wrapped module, causing DeepSpeed ZeRO-3 to fail when gradient checkpointing is enabled.

This issue occurs specifically with: - QLoRA + DeepSpeed ZeRO-3 - gradient_checkpointing: true - activation_offloading: true

References: - https://github.com/deepspeedai/DeepSpeed/issues/7203 - https://github.com/deepspeedai/DeepSpeed/blob/38d1a9eb64c9e01e32eccc50b25ba18925287441/deepspeed/runtime/zero/parameter_offload.py#L424-L458 - https://github.com/axolotl-ai-cloud/axolotl/pull/3102