integrations.diffusion.generation
integrations.diffusion.generation
Sample generation utilities for diffusion training.
Functions
| Name | Description |
|---|---|
| generate | Generate a single sample using reverse diffusion. |
| generate_samples | Generate text samples using the diffusion model by randomly masking sequences from |
generate
integrations.diffusion.generation.generate(
model,
tokenizer,
original_sequence,
num_diffusion_steps,
temperature,
mask_token_id,
*,
mode='random',
completion_tokens=0,
target_mask_ratio=None,
labels=None,
attention_mask=None,
)Generate a single sample using reverse diffusion.
generate_samples
integrations.diffusion.generation.generate_samples(
model,
tokenizer,
dataloader=None,
num_generation_samples=3,
max_length=100,
num_diffusion_steps=128,
temperature=0.0,
mask_token_id=32000,
mode='random',
completion_tokens=0,
target_mask_ratio=None,
)Generate text samples using the diffusion model by randomly masking sequences from the given dataset and running the reverse diffusion process.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model | torch.nn.Module | The wrapped or unwrapped model | required |
| tokenizer | Any | Tokenizer for encoding/decoding | required |
| dataloader | Optional[Any] | Validation dataloader (for sampling sequences) | None |
| num_generation_samples | int | Number of samples to generate | 3 |
| max_length | int | Maximum length of sequences to use | 100 |
| num_diffusion_steps | int | Number of diffusion steps for generation | 128 |
| temperature | float | Temperature for sampling (0.0 = deterministic) | 0.0 |
| mask_token_id | int | Token ID used for masking | 32000 |
Returns
| Name | Type | Description |
|---|---|---|
| List[dict] | List of dictionaries with original text, masked text, and generated text |