common.datasets
common.datasets
Dataset loading utilities.
Classes
Name | Description |
---|---|
TrainDatasetMeta | Dataclass with fields for training and validation datasets and metadata. |
TrainDatasetMeta
common.datasets.TrainDatasetMeta(
train_dataset,=None,
eval_dataset=None,
total_num_steps )
Dataclass with fields for training and validation datasets and metadata.
Functions
Name | Description |
---|---|
load_datasets | Loads one or more training or evaluation datasets, calling |
load_preference_datasets | Loads one or more training or evaluation datasets for RL training using paired |
sample_dataset | Randomly sample num_samples samples with replacement from dataset . |
load_datasets
=None, debug=False) common.datasets.load_datasets(cfg, cli_args
Loads one or more training or evaluation datasets, calling
axolotl.utils.data.prepare_datasets
. Optionally, logs out debug information.
Parameters
Name | Type | Description | Default |
---|---|---|---|
cfg | DictDefault | Dictionary mapping axolotl config keys to values. |
required |
cli_args | PreprocessCliArgs | TrainerCliArgs | None | Command-specific CLI arguments. | None |
debug | bool | Whether to print out tokenization of sample. This is duplicated in cfg and cli_args , but is kept due to use in our Colab notebooks. |
False |
Returns
Name | Type | Description |
---|---|---|
TrainDatasetMeta | Dataclass with fields for training and evaluation datasets and the computed total_num_steps . |
load_preference_datasets
common.datasets.load_preference_datasets(cfg, cli_args)
Loads one or more training or evaluation datasets for RL training using paired
preference data, calling axolotl.utils.data.rl.prepare_preference_datasets
.
Optionally, logs out debug information.
Parameters
Name | Type | Description | Default |
---|---|---|---|
cfg | DictDefault | Dictionary mapping axolotl config keys to values. |
required |
cli_args | PreprocessCliArgs | TrainerCliArgs | Command-specific CLI arguments. | required |
Returns
Name | Type | Description |
---|---|---|
TrainDatasetMeta | Dataclass with fields for training and evaluation datasets and the computed | |
TrainDatasetMeta | total_num_steps . |
sample_dataset
common.datasets.sample_dataset(dataset, num_samples)
Randomly sample num_samples
samples with replacement from dataset
.