core.trainers.mixins.layer_offloading
core.trainers.mixins.layer_offloading
Trainer mixin for layer-wise parameter offloading to CPU.
Offloads frozen (non-trainable) parameters in decoder layers to CPU, then uses forward/backward hooks to stream them on/off GPU one layer at a time with CUDA stream prefetching. Trainable parameters (e.g. LoRA weights) stay on GPU always.
pre-hook loads layer N’s frozen params to GPU (prefetches N+1 on
transfer stream), post-hook offloads layer N-1’s frozen params.
Backward: same in reverse order.
Classes
| Name | Description |
|---|---|
| LayerOffloadManager | Manages offloading frozen decoder layer params to CPU and streaming |
| LayerOffloadingMixin | Trainer mixin class for layer-wise parameter offloading to CPU. |
LayerOffloadManager
core.trainers.mixins.layer_offloading.LayerOffloadManager(model, num_prefetch=1)Manages offloading frozen decoder layer params to CPU and streaming them back during forward/backward with CUDA stream overlap.
Only frozen (requires_grad=False) parameters are offloaded. Trainable parameters (LoRA weights, etc.) remain on GPU at all times.
Methods
| Name | Description |
|---|---|
| post_step | Called after each training step — ensure layers are offloaded. |
| pre_step | Called before each training step — ensure layers start offloaded. |
| remove_hooks | Remove all hooks and restore layers to GPU. |
| setup_hooks | Register forward and backward hooks on each decoder layer. |
post_step
core.trainers.mixins.layer_offloading.LayerOffloadManager.post_step()Called after each training step — ensure layers are offloaded.
pre_step
core.trainers.mixins.layer_offloading.LayerOffloadManager.pre_step()Called before each training step — ensure layers start offloaded.
remove_hooks
core.trainers.mixins.layer_offloading.LayerOffloadManager.remove_hooks()Remove all hooks and restore layers to GPU.
setup_hooks
core.trainers.mixins.layer_offloading.LayerOffloadManager.setup_hooks()Register forward and backward hooks on each decoder layer.
LayerOffloadingMixin
core.trainers.mixins.layer_offloading.LayerOffloadingMixin(*args, **kwargs)Trainer mixin class for layer-wise parameter offloading to CPU.
Offloads frozen decoder layer params to CPU at init, then streams them on/off GPU one layer at a time during each training step.