core.builders.rl
core.builders.rl
Builder for RLHF trainers
Classes
| Name | Description |
|---|---|
| HFRLTrainerBuilder | Trainer factory class for TRL-based RLHF trainers (e.g. DPO) |
HFRLTrainerBuilder
core.builders.rl.HFRLTrainerBuilder(cfg, model, tokenizer, processor=None)Trainer factory class for TRL-based RLHF trainers (e.g. DPO)
Methods
| Name | Description |
|---|---|
| build_collator | Build a data collator for preference-tuning trainers. |
build_collator
core.builders.rl.HFRLTrainerBuilder.build_collator(**kwargs)Build a data collator for preference-tuning trainers.
Returns None for RL types that provide their own collator (e.g. GRPO,
KTO), letting the trainer construct its default. For DPO/IPO/ORPO/SIMPO
returns an AxolotlDPODataCollatorWithPadding when
pad_to_multiple_of is set, otherwise None (so the trainer
falls back to the TRL default).