core.trainers.grpo.replay_buffer

core.trainers.grpo.replay_buffer

Simple replay buffer for storing and sampling high-signal rollout groups.

Classes

Name Description
ReplayBuffer Min-heap replay buffer that keeps the highest-scoring rollout groups.

ReplayBuffer

core.trainers.grpo.replay_buffer.ReplayBuffer(max_size)

Min-heap replay buffer that keeps the highest-scoring rollout groups. Groups are scored by signal quality (advantage magnitude * reward variance). When sampling, groups are drawn proportional to their scores.

Methods

Name Description
add Add a group to the buffer. If full, replaces lowest-scoring entry.
sample Sample groups weighted by their scores. Returns None if buffer is empty.
add
core.trainers.grpo.replay_buffer.ReplayBuffer.add(score, data)

Add a group to the buffer. If full, replaces lowest-scoring entry.

sample
core.trainers.grpo.replay_buffer.ReplayBuffer.sample(num_samples)

Sample groups weighted by their scores. Returns None if buffer is empty.