integrations.diffusion.utils

integrations.diffusion.utils

Shared utilities for diffusion integration.

Functions

Name Description
create_bidirectional_attention_mask Create bidirectional attention mask to override default causal masking.
resolve_mask_token_id Resolve mask token id. Training may add a new special token; inference won’t.
shift_logits_to_input_positions Align next-token logits with their input token positions for diffusion.

create_bidirectional_attention_mask

integrations.diffusion.utils.create_bidirectional_attention_mask(
    input_ids,
    attention_mask=None,
    sample_packing=False,
)

Create bidirectional attention mask to override default causal masking. Handles sample-packed sequences where different samples are identified by different attention mask values.

Parameters

Name Type Description Default
input_ids torch.Tensor Input token ids [batch_size, seq_len] required
attention_mask Optional[torch.Tensor] Attention mask [batch_size, seq_len] None
sample_packing bool Whether sample packing is enabled False

Returns

Name Type Description
bidirectional_mask torch.Tensor 4D attention mask [batch_size, 1, seq_len, seq_len]

resolve_mask_token_id

integrations.diffusion.utils.resolve_mask_token_id(
    tokenizer,
    cfg,
    *,
    allow_add,
    model=None,
    default_token='<|diffusion_mask|>',
)

Resolve mask token id. Training may add a new special token; inference won’t.

shift_logits_to_input_positions

integrations.diffusion.utils.shift_logits_to_input_positions(logits)

Align next-token logits with their input token positions for diffusion.