integrations.nemo_gym.data_producer
integrations.nemo_gym.data_producer
NeMo Gym Data Producer for async GRPO training.
Replaces GRPODataProducer to generate rollouts via NeMo Gym agent /run endpoints instead of vLLM. The agent handles generation, tool execution, and reward computation. Returns RolloutDataset in the same format as the standard producer, so all downstream components (deferred scoring, IS correction, streaming, replay, re-roll) work unchanged.
Classes
| Name | Description |
|---|---|
| NemoGymDataProducer | Produces GRPO rollouts by calling NeMo Gym agent /run endpoints. |
NemoGymDataProducer
integrations.nemo_gym.data_producer.NemoGymDataProducer(
*args,
agent_servers,
dataset_lookup,
request_timeout=10800,
**kwargs,
)Produces GRPO rollouts by calling NeMo Gym agent /run endpoints.
Drop-in replacement for GRPODataProducer. Instead of calling vLLM for generation, sends prompts to NeMo Gym agents which handle generation + tool execution + reward. Returns the same RolloutDataset format so deferred scoring, IS correction, replay buffer, and re-roll all work unchanged.
Methods
| Name | Description |
|---|---|
| produce | Generate rollouts via NeMo Gym agents. |
produce
integrations.nemo_gym.data_producer.NemoGymDataProducer.produce(
model,
global_step,
*,
skip_policy_logps=False,
processing_class=None,
accelerator=None,
args=None,
_rank0_only=False,
**kwargs,
)Generate rollouts via NeMo Gym agents.
Calls agent /run endpoints, parses responses into padded tensors, and returns a RolloutDataset for deferred scoring on the main thread.