integrations.hatchery.rewards.math_reward
integrations.hatchery.rewards.math_reward
Math reward function for hendrycks_math GRPO training.
Uses math_verify for robust answer comparison. Falls back to exact string match of content only when math_verify is unavailable.
Functions
| Name | Description |
|---|---|
| extract_boxed | Extract answer handling nested braces. |
| math_reward | Score completions by checking if answer matches the gold answer. |
extract_boxed
integrations.hatchery.rewards.math_reward.extract_boxed(text)Extract answer handling nested braces.
math_reward
integrations.hatchery.rewards.math_reward.math_reward(
prompts,
completions,
**kwargs,
)Score completions by checking if answer matches the gold answer.
The gold answer is extracted from the prompt (appended as a hidden tag by the dataset preprocessing). Format: … <|gold|>ANSWER<|/gold|>