integrations.hatchery.data

integrations.hatchery.data

Convert axolotl batch tensors to Tinker/Hatchery Datum format.

Both Tinker and Hatchery expect the client to apply the causal LM shift:

Original tokens: [t0, t1, t2, …, t_{L-1}] model_input: [t0, t1, …, t_{L-2}] (last token dropped) target_tokens: [t1, t2, …, t_{L-1}] (first token dropped) weights: [w1, w2, …, w_{L-1}] (aligned to targets)

At position i, the model sees t_i and predicts target_tokens[i] = t_{i+1}.

Functions

Name Description
batch_to_datums_rl Convert an RL batch to importance_sampling/ppo Datum dicts with causal shift.
batch_to_datums_sft Convert an axolotl SFT batch to Datum dicts with causal shift.
datums_to_tinker Wrap plain-dict datums into tinker.types.Datum objects.

batch_to_datums_rl

integrations.hatchery.data.batch_to_datums_rl(
    input_ids,
    labels,
    logprobs,
    advantages,
    attention_mask=None,
)

Convert an RL batch to importance_sampling/ppo Datum dicts with causal shift.

batch_to_datums_sft

integrations.hatchery.data.batch_to_datums_sft(
    input_ids,
    labels,
    attention_mask=None,
)

Convert an axolotl SFT batch to Datum dicts with causal shift.

datums_to_tinker

integrations.hatchery.data.datums_to_tinker(datums)

Wrap plain-dict datums into tinker.types.Datum objects.

Both the Tinker SDK and updated Hatchery client accept these.