integrations.diffusion.generation

integrations.diffusion.generation

Sample generation utilities for diffusion training.

Functions

Name Description
generate Generate a single sample using reverse diffusion.
generate_samples Generate text samples using the diffusion model by randomly masking sequences from

generate

integrations.diffusion.generation.generate(
    model,
    tokenizer,
    original_sequence,
    num_diffusion_steps,
    temperature,
    mask_token_id,
    *,
    mode='random',
    completion_tokens=0,
    target_mask_ratio=None,
    labels=None,
    attention_mask=None,
)

Generate a single sample using reverse diffusion.

generate_samples

integrations.diffusion.generation.generate_samples(
    model,
    tokenizer,
    dataloader=None,
    num_generation_samples=3,
    max_length=100,
    num_diffusion_steps=128,
    temperature=0.0,
    mask_token_id=32000,
    mode='random',
    completion_tokens=0,
    target_mask_ratio=None,
)

Generate text samples using the diffusion model by randomly masking sequences from the given dataset and running the reverse diffusion process.

Parameters

Name Type Description Default
model torch.nn.Module The wrapped or unwrapped model required
tokenizer Any Tokenizer for encoding/decoding required
dataloader Optional[Any] Validation dataloader (for sampling sequences) None
num_generation_samples int Number of samples to generate 3
max_length int Maximum length of sequences to use 100
num_diffusion_steps int Number of diffusion steps for generation 128
temperature float Temperature for sampling (0.0 = deterministic) 0.0
mask_token_id int Token ID used for masking 32000

Returns

Name Type Description
List[dict] List of dictionaries with original text, masked text, and generated text