integrations.nemo_gym.multi_turn

integrations.nemo_gym.multi_turn

Multi-turn rollout function for NeMo Gym environments.

Delegates multi-turn orchestration to NeMo Gym’s agent servers via the /run endpoint. The agent handles generation (by calling our vLLM server), tool execution, session management, and reward computation.

This follows the same pattern as TRL’s reference implementation at examples/scripts/nemo_gym/train_multi_environment.py.

Architecture

rollout_func(prompts, trainer) -> expand prompts by num_generations -> async POST /run to agent servers (one per sample) -> parse response: prompt_ids, completion_ids, logprobs, env_mask, reward -> return to TRL for GRPO training

Functions

Name	Description
create_nemo_gym_rollout_func	Create a TRL-compatible rollout_func that delegates to NeMo Gym agents.

create_nemo_gym_rollout_func

integrations.nemo_gym.multi_turn.create_nemo_gym_rollout_func(
    agent_servers,
    dataset_lookup,
    request_timeout=10800,
)

Create a TRL-compatible rollout_func that delegates to NeMo Gym agents.

Parameters

Name	Type	Description	Default
agent_servers	dict[str, str]	Mapping of agent_name → agent URL (e.g., {“simple_agent”: “http://host:port”}).	required
dataset_lookup	dict[int, dict]	Mapping of dataset index → full JSONL row dict.	required
request_timeout	float	HTTP timeout for /run requests.	`10800`

Returns

Name	Type	Description
		A rollout_func with signature (prompts: list[str], trainer) -> dict.