FAQ
General
Q: The trainer stopped and hasn’t progressed in several minutes.
A: Usually an issue with the GPUs communicating with each other. See the NCCL doc
Q: Exitcode -9
A: This usually happens when you run out of system RAM.
Q: Exitcode -7 while using deepspeed
A: Try upgrading deepspeed w:
pip install -U deepspeed
Q: AttributeError: ‘DummyOptim’ object has no attribute ‘step’
Q: ModuleNotFoundError: No module named ‘mpi4py’ using single GPU with deepspeed
A: You may be using deepspeed with single gpu. Please remove the
deepspeed:
section in the yaml file or--deepspeed
CLI flag.
Q: The codes is stuck on saving preprocessed datasets.
A: This is usually an issue with the GPU. This can be resolved through setting the os environment variable
CUDA_VISIBLE_DEVICES=0
. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it.
Q: Received mismatch error on merge adapters / loading adapters between torch.Size of checkpoint and model.
A: This is likely due to vocab size mismatch. By default, Axolotl expands the model’s embeddings if the tokenizer has more tokens than the model. Please use the
axolotl merge-lora
command to merge the adapters instead of using your own scripts.
On the other hand, if the model has more tokens than the tokenizer, Axolotl does not shrink the model’s embeddings unless
shrink_embeddings: true
is set in the config.
Q: How to call Axolotl via custom python scripts?
A: Since Axolotl is just Python, please see
src/axolotl/cli/main.py
on how each command is called.
Q: How to know the value to use for fsdp_transformer_layer_cls_to_wrap
?
A: This is the class name of the transformer layer to wrap with FSDP. For example, for
LlamaForCausalLM
, the value isLlamaDecoderLayer
. To find this for a specific model, check the model’sPreTrainedModel
definition and look for_no_split_modules
variable in themodeling_<model_name>.py
file withintransformers
library.
Q: ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token
A: This is because the tokenizer does not have a padding token. Please add a padding token to the tokenizer via:
special_tokens: # str. If you're not sure, set to same as `eos_token`. pad_token: "..."
Chat templates
Q: jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____
A: This means that the property mapping for the stated attribute does not exist when building
chat_template
prompt. For example, ifno attribute 'content'
, please check you have added the correct mapping forcontent
undermessage_property_mappings
.
Q: Empty template generated for turn ___
A: The
content
is empty for that turn.
Q: Could not find content start/end boundary for turn __
A: The specific turn’s start/end could not be detected. Please ensure you have set the
eos_token
following yourchat_template
. Otherwise, this could be achat_template
which doesn’t use proper boundaries for each turn (like system). On the rare occurrence, make sure your content is not[[dummy_message]]
. Please let us know about this.
Q: Content end boundary is before start boundary for turn ___
A: This is an edge case which should not occur. Please create an Issue if this happens.
Q: Content end boundary is the same as start boundary for turn ___. This is likely an empty turn.
A: This is likely an empty turn.
Q: The EOS/EOT token is incorrectly being masked or not being masked.
A: This is because of the mismatch between
tokenizer.eos_token
and EOS/EOT token in template. Please make sure to seteos_token
underspecial_tokens
to the same EOS/EOT token as in template.
Q: “chat_template
choice is tokenizer_default
but tokenizer’s chat_template
is null. Please add a chat_template
in tokenizer config”
A: This is because the tokenizer does not have a chat template. Please add a chat template in the tokenizer config. See chat_template for more details.