utils.data.wrappers
utils.data.wrappers
Data handling specific to SFT.
Functions
| Name | Description |
|---|---|
| get_dataset_wrapper | Create an appropriate dataset wrapper and prompter based on dataset |
| handle_unknown_dataset_strategy | Raise error for unknown dataset strategy. |
get_dataset_wrapper
utils.data.wrappers.get_dataset_wrapper(
dataset_config,
tokenizer,
cfg,
dataset_base_type,
dataset,
dataset_prompt_style=None,
processor=None,
)Create an appropriate dataset wrapper and prompter based on dataset configuration.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| dataset_config | DictDefault | Configuration for the dataset. | required |
| tokenizer | PreTrainedTokenizer | Tokenizer to use for processing text. | required |
| cfg | DictDefault | Global configuration object. | required |
| dataset_base_type | str | None | The base type of the dataset. | required |
| dataset | Dataset | IterableDataset | The actual dataset object. | required |
| dataset_prompt_style | str | None | Optional prompt style specification. | None |
| processor | ProcessorMixin | None | Optional processor for multimodal datasets. | None |
Returns
| Name | Type | Description |
|---|---|---|
| tuple[Dataset | IterableDataset, Prompter | None] | tuple of (dataset_wrapper, dataset_prompter). |
handle_unknown_dataset_strategy
utils.data.wrappers.handle_unknown_dataset_strategy(dataset_config)Raise error for unknown dataset strategy.