integrations.hatchery.rewards.math_reward

integrations.hatchery.rewards.math_reward

Math reward function for hendrycks_math GRPO training.

Uses math_verify for robust answer comparison. Falls back to exact string match of content only when math_verify is unavailable.

Functions

Name Description
extract_boxed Extract answer handling nested braces.
math_reward Score completions by checking if answer matches the gold answer.

extract_boxed

integrations.hatchery.rewards.math_reward.extract_boxed(text)

Extract answer handling nested braces.

math_reward

integrations.hatchery.rewards.math_reward.math_reward(
    prompts,
    completions,
    **kwargs,
)

Score completions by checking if answer matches the gold answer.

The gold answer is extracted from the prompt (appended as a hidden tag by the dataset preprocessing). Format: … <|gold|>ANSWER<|/gold|>