integrations.hatchery.data
integrations.hatchery.data
Convert axolotl batch tensors to Tinker/Hatchery Datum format.
Both Tinker and Hatchery expect the client to apply the causal LM shift:
Original tokens: [t0, t1, t2, …, t_{L-1}] model_input: [t0, t1, …, t_{L-2}] (last token dropped) target_tokens: [t1, t2, …, t_{L-1}] (first token dropped) weights: [w1, w2, …, w_{L-1}] (aligned to targets)
At position i, the model sees t_i and predicts target_tokens[i] = t_{i+1}.
Functions
| Name | Description |
|---|---|
| batch_to_datums_rl | Convert an RL batch to importance_sampling/ppo Datum dicts with causal shift. |
| batch_to_datums_sft | Convert an axolotl SFT batch to Datum dicts with causal shift. |
| datums_to_tinker | Wrap plain-dict datums into tinker.types.Datum objects. |
batch_to_datums_rl
integrations.hatchery.data.batch_to_datums_rl(
input_ids,
labels,
logprobs,
advantages,
attention_mask=None,
)Convert an RL batch to importance_sampling/ppo Datum dicts with causal shift.
batch_to_datums_sft
integrations.hatchery.data.batch_to_datums_sft(
input_ids,
labels,
attention_mask=None,
)Convert an axolotl SFT batch to Datum dicts with causal shift.
datums_to_tinker
integrations.hatchery.data.datums_to_tinker(datums)Wrap plain-dict datums into tinker.types.Datum objects.
Both the Tinker SDK and updated Hatchery client accept these.