integrations.nemo_gym.data_producer

integrations.nemo_gym.data_producer

NeMo Gym Data Producer for async GRPO training.

Replaces GRPODataProducer to generate rollouts via NeMo Gym agent /run endpoints instead of vLLM. The agent handles generation, tool execution, and reward computation. Returns RolloutDataset in the same format as the standard producer, so all downstream components (deferred scoring, IS correction, streaming, replay, re-roll) work unchanged.

Classes

Name	Description
NemoGymDataProducer	Produces GRPO rollouts by calling NeMo Gym agent /run endpoints.

NemoGymDataProducer

integrations.nemo_gym.data_producer.NemoGymDataProducer(
    *args,
    agent_servers,
    dataset_lookup,
    request_timeout=10800,
    **kwargs,
)

Produces GRPO rollouts by calling NeMo Gym agent /run endpoints.

Drop-in replacement for GRPODataProducer. Instead of calling vLLM for generation, sends prompts to NeMo Gym agents which handle generation + tool execution + reward. Returns the same RolloutDataset format so deferred scoring, IS correction, replay buffer, and re-roll all work unchanged.

Methods

Name	Description
produce	Generate rollouts via NeMo Gym agents.

produce

integrations.nemo_gym.data_producer.NemoGymDataProducer.produce(
    model,
    global_step,
    *,
    skip_policy_logps=False,
    processing_class=None,
    accelerator=None,
    args=None,
    _rank0_only=False,
    **kwargs,
)

Generate rollouts via NeMo Gym agents.

Calls agent /run endpoints, parses responses into padded tensors, and returns a RolloutDataset for deferred scoring on the main thread.