integrations.nemo_gym.rewards
integrations.nemo_gym.rewards
NeMo Gym reward functions.
Provides ready-to-use reward functions for axolotl configs::
trl:
reward_funcs:
# Multi-turn: passthrough reward from agent /run
- axolotl.integrations.nemo_gym.rewards.reward_env
# Single-turn: call /verify endpoints directly
- axolotl.integrations.nemo_gym.rewards.reward_nemo_gym_verify
Functions
| Name | Description |
|---|---|
| create_nemo_gym_reward_fn | Create a reward function bound to specific verify endpoints. |
| reward_env | Passthrough: extract pre-computed reward from NeMo Gym agent /run response. |
| reward_nemo_gym_verify | Call NeMo Gym /verify endpoint for each completion (single-turn). |
create_nemo_gym_reward_fn
integrations.nemo_gym.rewards.create_nemo_gym_reward_fn(
global_config,
verify_endpoints,
model_name='axolotl-model',
verify_timeout=30,
)Create a reward function bound to specific verify endpoints.
Used internally by NemoGymPlugin._wire_single_turn() to inject a
reward function that already knows the endpoint map (no discovery needed).
reward_env
integrations.nemo_gym.rewards.reward_env(completions, prompts=None, **kwargs)Passthrough: extract pre-computed reward from NeMo Gym agent /run response.
The NemoGymDataProducer injects env_reward into each sample’s
kwargs after the agent returns from /run. This function simply
forwards that value so TRL can log it alongside other reward signals.
Use this in your config when nemo_gym_multi_turn: true::
trl:
reward_funcs:
- axolotl.integrations.nemo_gym.rewards.reward_env
reward_nemo_gym_verify
integrations.nemo_gym.rewards.reward_nemo_gym_verify(
completions,
prompts=None,
**kwargs,
)Call NeMo Gym /verify endpoint for each completion (single-turn).
Requires resources_server_ref and verify_extra kwargs, which the
NeMo Gym dataset loader injects automatically.
Use this in your config when nemo_gym_multi_turn: false::
trl:
reward_funcs:
- axolotl.integrations.nemo_gym.rewards.reward_nemo_gym_verify