integrations.hatchery.rl_trainer

integrations.hatchery.rl_trainer

Remote RL trainer (GRPO/PPO) using Tinker or Hatchery API.

Full RL loop per step

  1. Extract prompts from dataset batch
  2. Sample N completions per prompt via remote SamplingClient
  3. Score completions with local reward functions
  4. Compute GRPO-style advantages (per-group normalization)
  5. Send (prompt+completion, logprobs, advantages) as forward_backward
  6. Optimizer step

Classes

Name Description
HatcheryRLTrainer Remote RL trainer using Tinker/Hatchery for sampling and training.

HatcheryRLTrainer

integrations.hatchery.rl_trainer.HatcheryRLTrainer(*args, **kwargs)

Remote RL trainer using Tinker/Hatchery for sampling and training.