utils.data.sft
utils.data.sft
Data handling specific to SFT.
Functions
Name | Description |
---|---|
prepare_datasets | Prepare training and evaluation datasets based on configuration. |
prepare_datasets
=None) utils.data.sft.prepare_datasets(cfg, tokenizer, processor
Prepare training and evaluation datasets based on configuration.
Parameters
Name | Type | Description | Default |
---|---|---|---|
cfg | DictDefault | Dictionary mapping axolotl config keys to values. |
required |
tokenizer | PreTrainedTokenizer | Tokenizer to use for processing text. | required |
processor | ProcessorMixin | None | Optional processor for multimodal datasets. | None |
Returns
Name | Type | Description |
---|---|---|
tuple[IterableDataset | Dataset, Dataset | None, int, list[Prompter | None]] | Tuple of (train_dataset, eval_dataset, total_steps, prompters). |