core.trainers.ebft.trainer
core.trainers.ebft.trainer
EBFT Trainer — Energy-Based Fine-Tuning integrated via GRPOTrainer.
Extends AxolotlGRPOTrainer by plugging feature-matching rewards into the standard GRPO reward function interface.
“Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models”
(Jelassi et al., 2026) https://arxiv.org/abs/2603.12248
Classes
| Name | Description |
|---|---|
| AxolotlAsyncEBFTTrainer | EBFT trainer using async GRPO (prefetches next batch during training). |
| AxolotlEBFTTrainer | EBFT trainer using synchronous GRPO (standard vLLM generation). |
| EBFTMixin | Mixin that adds EBFT feature-matching reward logic to any GRPO-based trainer. |
AxolotlAsyncEBFTTrainer
core.trainers.ebft.trainer.AxolotlAsyncEBFTTrainer(
model,
args=None,
train_dataset=None,
eval_dataset=None,
processing_class=None,
callbacks=None,
optimizers=(None, None),
peft_config=None,
)EBFT trainer using async GRPO (prefetches next batch during training).
AxolotlEBFTTrainer
core.trainers.ebft.trainer.AxolotlEBFTTrainer(
model,
args=None,
train_dataset=None,
eval_dataset=None,
processing_class=None,
callbacks=None,
optimizers=(None, None),
peft_config=None,
)EBFT trainer using synchronous GRPO (standard vLLM generation).
EBFTMixin
core.trainers.ebft.trainer.EBFTMixin(
model,
args=None,
train_dataset=None,
eval_dataset=None,
processing_class=None,
callbacks=None,
optimizers=(None, None),
peft_config=None,
)Mixin that adds EBFT feature-matching reward logic to any GRPO-based trainer.
Provides: - Frozen feature network setup (shared weights for PEFT, deepcopy otherwise) - _feature_matching_reward() callable for GRPO reward function interface - _sequential_rollout() for multi-turn conversations