utils.samplers.multipack

utils.samplers.multipack

Multipack Batch Sampler

Classes

Name Description
MultipackBatchSampler Batch sampler class for multipack

MultipackBatchSampler

utils.samplers.multipack.MultipackBatchSampler(
    self,
    sampler,
    batch_size,
    batch_max_len,
    lengths,
    packing_efficiency_estimate=1.0,
    drop_last=False,
    num_count_samples=16,
    sequential=False,
    **kwargs,
)

Batch sampler class for multipack

Functions

Name Description
allocate_sequentially Sequential allocator that preserves example order

allocate_sequentially

utils.samplers.multipack.allocate_sequentially(lengths, rank, c, n)

Sequential allocator that preserves example order

Parameters: - lengths: The lengths of all examples - rank: The current rank (for distributed training) - c: The capacity of each bin (maximum sequence length) - n: Number of ranks

Returns: - result: List of batches for the current rank - total_used: Number of actual example tokens - total_slots: Maximum theoretical number of example tokens (number of bins * bin capacity)