core.trainers.grpo.replay_buffer
core.trainers.grpo.replay_buffer
Simple replay buffer for storing and sampling high-signal rollout groups.
Classes
| Name | Description |
|---|---|
| ReplayBuffer | Min-heap replay buffer that keeps the highest-scoring rollout groups. |
ReplayBuffer
core.trainers.grpo.replay_buffer.ReplayBuffer(max_size)Min-heap replay buffer that keeps the highest-scoring rollout groups. Groups are scored by signal quality (advantage magnitude * reward variance). When sampling, groups are drawn proportional to their scores.
Methods
| Name | Description |
|---|---|
| add | Add a group to the buffer. If full, replaces lowest-scoring entry. |
| sample | Sample groups weighted by their scores. Returns None if buffer is empty. |
add
core.trainers.grpo.replay_buffer.ReplayBuffer.add(score, data)Add a group to the buffer. If full, replaces lowest-scoring entry.
sample
core.trainers.grpo.replay_buffer.ReplayBuffer.sample(num_samples)Sample groups weighted by their scores. Returns None if buffer is empty.