integrations.hatchery.rl_trainer
integrations.hatchery.rl_trainer
Remote RL trainer (GRPO/PPO) using Tinker or Hatchery API.
Full RL loop per step
- Extract prompts from dataset batch
- Sample N completions per prompt via remote SamplingClient
- Score completions with local reward functions
- Compute GRPO-style advantages (per-group normalization)
- Send (prompt+completion, logprobs, advantages) as forward_backward
- Optimizer step
Classes
| Name | Description |
|---|---|
| HatcheryRLTrainer | Remote RL trainer using Tinker/Hatchery for sampling and training. |
HatcheryRLTrainer
integrations.hatchery.rl_trainer.HatcheryRLTrainer(*args, **kwargs)Remote RL trainer using Tinker/Hatchery for sampling and training.