loaders.model

loaders.model

Model loader class implementation for loading, configuring, and patching various models.

Classes

Name Description
ModelLoader Manages model configuration, initialization and application of patches during

ModelLoader

loaders.model.ModelLoader(
    cfg,
    tokenizer,
    *,
    inference=False,
    reference_model=False,
    **kwargs,
)

Manages model configuration, initialization and application of patches during model loading.

This class orchestrates the entire process of loading a model from configuration to final preparation. It handles device mapping, quantization, attention mechanisms, adapter integration, and various optimizations.

The loading process includes

  • Loading and validating model configuration
  • Applying monkey patches for optimizations / fixes
  • Setting up device mapping (including multi-GPU configurations)
  • Configuring quantization
  • Setting attention mechanisms (Flash Attention, SDPA, etc.)
  • Loading and initializing the model
  • Applying adapters (LoRA, QLoRA, etc.)

Attributes

Name Type Description
model PreTrainedModel | PeftModel | PeftMixedModel The loaded model instance (available after load() is called).
model_kwargs dict[str, Any] Dictionary of keyword arguments passed to model initialization.
base_model Name or path of the base model to load.
model_type Type of model to load (e.g., AutoModelForCausalLM).
model_config Configuration object for the model.
auto_model_loader class used for loading the model (default: AutoModelForCausalLM).

Methods

Name Description
load Load and prepare the model with all configurations and patches.
load
loaders.model.ModelLoader.load()

Load and prepare the model with all configurations and patches.

Returns
Name Type Description
tuple[PreTrainedModel | PeftModelForCausalLM, PeftConfig | None] A tuple with the loaded model and its LoRA configuration (if applicable).