integrations.diffusion.utils
integrations.diffusion.utils
Shared utilities for diffusion integration.
Functions
| Name | Description |
|---|---|
| create_bidirectional_attention_mask | Create bidirectional attention mask to override default causal masking. |
| resolve_mask_token_id | Resolve mask token id. Training may add a new special token; inference won’t. |
| shift_logits_to_input_positions | Align next-token logits with their input token positions for diffusion. |
create_bidirectional_attention_mask
integrations.diffusion.utils.create_bidirectional_attention_mask(
input_ids,
attention_mask=None,
sample_packing=False,
)Create bidirectional attention mask to override default causal masking. Handles sample-packed sequences where different samples are identified by different attention mask values.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| input_ids | torch.Tensor | Input token ids [batch_size, seq_len] | required |
| attention_mask | Optional[torch.Tensor] | Attention mask [batch_size, seq_len] | None |
| sample_packing | bool | Whether sample packing is enabled | False |
Returns
| Name | Type | Description |
|---|---|---|
| bidirectional_mask | torch.Tensor | 4D attention mask [batch_size, 1, seq_len, seq_len] |
resolve_mask_token_id
integrations.diffusion.utils.resolve_mask_token_id(
tokenizer,
cfg,
*,
allow_add,
model=None,
default_token='<|diffusion_mask|>',
)Resolve mask token id. Training may add a new special token; inference won’t.
shift_logits_to_input_positions
integrations.diffusion.utils.shift_logits_to_input_positions(logits)Align next-token logits with their input token positions for diffusion.