integrations.liger.models.base

integrations.liger.models.base

Generic FLCE patch for untested models similar to Llama

Functions

Name	Description
lce_forward

lce_forward

integrations.liger.models.base.lce_forward(
    self,
    *args,
    output_attentions=None,
    output_hidden_states=None,
    return_dict=None,
    labels=None,
    logits_to_keep=0,
    skip_logits=None,
    **kwargs,
)

Parameters

Name	Type	Description	Default
labels	`torch.LongTensor` of shape `(batch_size, sequence_length)`, optional	Labels for computing the masked language modeling loss. Indices should either be in `[0, ..., config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.	`None`
logits_to_keep	`int` or `torch.Tensor`, optional	If an `int`, compute logits for the last `logits_to_keep` tokens. If `0`, calculate logits for all `input_ids` (special case). Only last token logits are needed for generation, and calculating them only for that token can save memory, which becomes pretty significant for long sequences or large vocabulary size. If a `torch.Tensor`, must be 1D corresponding to the indices to keep in the sequence length dimension. This is useful when using packed tensor format (single dimension for batch and sequence length).	`0`