Learning Rate Groups
    Setting different learning rates by module name
  
Background
Inspired by LoRA+, Axolotl allows practitioners to specify separate learning rates for each module or groups of modules in a model.
Example
lr_groups:
  - name: o_proj
    modules:
      - self_attn.o_proj.weight
    lr: 1e-6
  - name: q_proj
    modules:
      - model.layers.2.self_attn.q_proj.weight
    lr: 1e-5
learning_rate: 2e-5In this example, we have a default learning rate of 2e-5 across the entire model, but we have a separate learning rate
of 1e-6 for all the self attention o_proj modules across all layers, and a learning are of 1e-5 to the 3rd layer’s
self attention q_proj module.
Note
We currently only support varying lr for now. If you’re interested in adding support for others (weight_decay), we welcome PRs. See https://github.com/axolotl-ai-cloud/axolotl/blob/613bcf90e58f3ab81d3827e7fc572319908db9fb/src/axolotl/core/trainers/mixins/optimizer.py#L17