utils.data.sft

utils.data.sft

Data handling specific to SFT.

Functions

Name Description
prepare_datasets Prepare training and evaluation datasets based on configuration.

prepare_datasets

utils.data.sft.prepare_datasets(cfg, tokenizer, processor=None)

Prepare training and evaluation datasets based on configuration.

Parameters

Name Type Description Default
cfg DictDefault Dictionary mapping axolotl config keys to values. required
tokenizer PreTrainedTokenizer Tokenizer to use for processing text. required
processor ProcessorMixin | None Optional processor for multimodal datasets. None

Returns

Name Type Description
tuple[IterableDataset | Dataset, Dataset | None, int, list[Prompter | None]] Tuple of (train_dataset, eval_dataset, total_steps, prompters).