utils.data.wrappers

utils.data.wrappers

Data handling specific to SFT.

Functions

Name Description
get_dataset_wrapper Create an appropriate dataset wrapper and prompter based on dataset
handle_unknown_dataset_strategy Raise error for unknown dataset strategy.

get_dataset_wrapper

utils.data.wrappers.get_dataset_wrapper(
    dataset_config,
    tokenizer,
    cfg,
    dataset_base_type,
    dataset,
    dataset_prompt_style=None,
    processor=None,
)

Create an appropriate dataset wrapper and prompter based on dataset configuration.

Parameters

Name Type Description Default
dataset_config DictDefault Configuration for the dataset. required
tokenizer PreTrainedTokenizer Tokenizer to use for processing text. required
cfg DictDefault Global configuration object. required
dataset_base_type str | None The base type of the dataset. required
dataset Dataset | IterableDataset The actual dataset object. required
dataset_prompt_style str | None Optional prompt style specification. None
processor ProcessorMixin | None Optional processor for multimodal datasets. None

Returns

Name Type Description
tuple[Dataset | IterableDataset, Prompter | None] tuple of (dataset_wrapper, dataset_prompter).

handle_unknown_dataset_strategy

utils.data.wrappers.handle_unknown_dataset_strategy(dataset_config)

Raise error for unknown dataset strategy.