KeyError: Parameter containing #2205

Amerehei · 2024-11-08T17:11:46Z

I want to run sft example and I get some erros, Can you help me to find the problem?

I run run_peft_fsdp.sh with --model_name_or_path "meta-llama/Llama-2-7b-hf" (I used smaller model, just for test purpose)

I use pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04. Here are my environments details and erros

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Packages

Name: transformers
Version: 4.47.0.dev0
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [email protected]
License: Apache 2.0 License
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft, trl, unsloth_zoo
---
Name: accelerate
Version: 1.1.0.dev0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: [email protected]
License: Apache
Location: /usr/local/lib/python3.11/dist-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: peft, trl, unsloth_zoo
---
Name: peft
Version: 0.13.3.dev0
Summary: Parameter-Efficient Fine-Tuning (PEFT)
Home-page: https://github.com/huggingface/peft
Author: The HuggingFace team
Author-email: [email protected]
License: Apache
Location: /usr/local/lib/python3.11/dist-packages
Requires: accelerate, huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch, tqdm, transformers
Required-by: unsloth_zoo
---
Name: trl
Version: 0.13.0.dev0
Summary: Train transformer language models with reinforcement learning.
Home-page: https://github.com/huggingface/trl
Author: Leandro von Werra
Author-email: [email protected]
License: Apache 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: accelerate, datasets, rich, transformers
Required-by: unsloth_zoo
---
Name: datatrove
Version: 0.3.0
Summary: HuggingFace library to process and filter large amounts of webdata
Home-page:
Author:
Author-email: "HuggingFace Inc." 
License: Apache-2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: dill, fsspec, huggingface-hub, humanize, loguru, multiprocess, numpy, tqdm
Required-by:
---
Name: unsloth
Version: 2024.11.5
Summary: 2-5X faster LLM finetuning
Home-page: http://www.unsloth.ai
Author: Unsloth AI team
Author-email: [email protected]
---
Name: deepspeed
Version: 0.15.3
Summary: DeepSpeed library
Home-page: http://deepspeed.ai
Author: DeepSpeed Team
Author-email: [email protected]
License: Apache Software License 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: hjson, msgpack, ninja, numpy, nvidia-ml-py, packaging, psutil, py-cpuinfo, pydantic, torch, tqdm
Required-by:
---
Name: PyGithub
Version: 2.5.0
Summary: Use the full Github API v3
Home-page:
Author:
Author-email: Vincent Jacques 
License:
Location: /usr/local/lib/python3.11/dist-packages
Requires: Deprecated, pyjwt, pynacl, requests, typing-extensions, urllib3
Required-by:
---
Name: flash-attn
Version: 2.6.3
Summary: Flash Attention: Fast and Memory-Efficient Exact Attention
Home-page: https://github.com/Dao-AILab/flash-attention
Author: Tri Dao
Author-email: [email protected]
License:
Location: /usr/local/lib/python3.11/dist-packages
Requires: einops, torch
Required-by:
---
Name: huggingface-hub
Version: 0.26.2
Summary: Client library to download and publish models, datasets and other repos on the huggingface.co hub
Home-page: https://github.com/huggingface/huggingface_hub
Author: Hugging Face, Inc.
Author-email: [email protected]
License: Apache
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, fsspec, packaging, pyyaml, requests, tqdm, typing-extensions
Required-by: accelerate, datasets, datatrove, evaluate, peft, tokenizers, transformers, unsloth_zoo
---
Name: evaluate
Version: 0.4.3
Summary: HuggingFace community-driven open-source library of evaluation
Home-page: https://github.com/huggingface/evaluate
Author: HuggingFace Inc.
Author-email: [email protected]
License: Apache 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: datasets, dill, fsspec, huggingface-hub, multiprocess, numpy, packaging, pandas, requests, tqdm, xxhash
Required-by:
---
Name: datasets
Version: 3.1.0
Summary: HuggingFace community-driven open-source library of datasets
Home-page: https://github.com/huggingface/datasets
Author: HuggingFace Inc.
Author-email: [email protected]
License: Apache 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: aiohttp, dill, filelock, fsspec, huggingface-hub, multiprocess, numpy, packaging, pandas, pyarrow, pyyaml, requests, tqdm, xxhash
Required-by: evaluate, trl, unsloth_zoo
---
Name: bitsandbytes
Version: 0.44.1
Summary: k-bit optimizers and matrix multiplication routines.
Home-page: https://github.com/TimDettmers/bitsandbytes
Author: Tim Dettmers
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.11/dist-packages
Requires: numpy, torch
Required-by:
---
Name: einops
Version: 0.8.0
Summary: A new flavour of deep learning operations
Home-page: https://github.com/arogozhnikov/einops
Author: Alex Rogozhnikov
Author-email:
License: MIT
Location: /usr/local/lib/python3.11/dist-packages
Requires:
Required-by: flash-attn
---
Name: wandb
Version: 0.18.6
Summary: A CLI and library for interacting with the Weights & Biases API.
Home-page:
Author:
Author-email: Weights & Biases 
---
Name: pandas
Version: 2.2.3
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author:
Author-email: The Pandas Development Team 
---
Name: numpy
Version: 1.26.3
Summary: Fundamental package for array computing in Python
Home-page: https://numpy.org
Author: Travis E. Oliphant et al.
Required-by: accelerate, bitsandbytes, contourpy, datasets, datatrove, deepspeed, evaluate, matplotlib, pandas, peft, scikit-learn, scipy, tensorboard, torchvision, transformers, unsloth_zoo, xformers
---
Name: scipy
Version: 1.14.1
Summary: Fundamental algorithms for scientific computing in Python
Home-page: https://scipy.org/
Author:
Author-email:
---
Name: sentencepiece
Version: 0.2.0
Summary: SentencePiece python wrapper
Home-page: https://github.com/google/sentencepiece
Author: Taku Kudo
Author-email: [email protected]
License: Apache
Location: /usr/local/lib/python3.11/dist-packages
Requires:
Required-by: unsloth_zoo
---
Name: nltk
Version: 3.9.1
Summary: Natural Language Toolkit
Home-page: https://www.nltk.org/
Author: NLTK Team
Author-email: [email protected]
License: Apache License, Version 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: click, joblib, regex, tqdm
Required-by:
---
Name: xformers
Version: 0.0.28.post3
Summary: XFormers: A collection of composable Transformer building blocks.
Home-page: https://facebookresearch.github.io/xformers/
Author: Facebook AI Research
Author-email: [email protected]
License:
Location: /usr/local/lib/python3.11/dist-packages
Requires: numpy, torch
Required-by:
---
Name: hf_transfer
Version: 0.1.8
Summary: Speed up file transfers with the Hugging Face Hub.
Home-page:
Author:
Author-email:
License:
Location: /usr/local/lib/python3.11/dist-packages
Requires:
Required-by: unsloth_zoo
---
Name: scikit-learn
Version: 1.5.2
Summary: A set of python modules for machine learning and data mining
Home-page: https://scikit-learn.org
Author:
Author-email:
License: BSD 3-Clause License

The text was updated successfully, but these errors were encountered:

Amerehei · 2024-11-08T17:11:50Z

Log

 accelerate launch --config_file "configs/fsdp_config.yaml"  train.py \
--seed 100 \                                                                                       --seed 100    --model_name_or_path "meta-llama/Llama-2-7b-hf"    --dataset_name "smangrul/ultrachat-10k-chatml"    --chat_template_format "chatml"    --add_special_tokens False    --append_concat_token False    --splits "train,test"    --max_seq_len 2048    --num_train_epochs 1    --logging_steps 5    --log_level "info"    --logging_strategy "steps"    --eval_strategy "epoch"    --save_strategy "epoch"    --push_to_hub    --hub_private_repo True    --hub_strategy "every_save"    --bf16 True    --packing True    --learning_rate 1e-4    --lr_scheduler_type "cosine"    --weight_decay 1e-4    --warmup_ratio 0.0    --max_grad_norm 1.0    --output_dir "mistral-sft-lora-fsdp"    --per_device_train_batch_size 8    --per_device_eval_batch_size 8    --gradient_accumulation_steps 4    --gradient_checkpointing True    --use_reentrant False    --dataset_text_field "content"    --use_flash_attn True    --use_peft_lora True    --lora_r 8    --lora_alpha 16    --lora_dropout 0.1    --lora_target_modules "all-linear"    --use_4bit_quantization False
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 609/609 [00:00<00:00, 1.96MB/s]
model.safetensors.index.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 69.7MB/s]
model-00001-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.98G/9.98G [03:51<00:00, 43.0MB/s]
model-00002-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [00:17<00:00, 202MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.72s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.71s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.71s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.71s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 105.78s/it]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.71s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.73s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.73s/it]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [04:09<00:00, 124.73s/it]
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.50it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.44it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.30it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.37it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.47it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.33it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.29it/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 188/188 [00:00<00:00, 1.53MB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 776/776 [00:00<00:00, 7.62MB/s]
tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 38.8MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 16.9MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 1.49MB/s]
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Loading checkpoint shards:  50%|██████████████████████████████████████████████████████████████████████▌                                                                      | 1/2 [00:07<00:07,  7.58s/it]The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.09s/it]
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
README.md: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 524/524 [00:00<00:00, 4.88MB/s]
train-00000-of-00001.parquet: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35.2M/35.2M [00:00<00:00, 42.5MB/s]
test-00000-of-00001.parquet: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.08M/7.08M [00:00<00:00, 42.3MB/s]
Generating train split: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 44326.80 examples/s]
Generating test split: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 49700.55 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 761.49 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 789.50 examples/s]
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2276.91 examples/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2460.44 examples/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1008.03 examples/s]
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
[rank3]:[W1108 16:43:16.179385976 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 3]  using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank5]:[W1108 16:43:16.198058004 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 5]  using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank2]:[W1108 16:43:16.203509811 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 2]  using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank6]:[W1108 16:43:16.216316855 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 6]  using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank1]:[W1108 16:43:16.240819550 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 1]  using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[rank7]:[W1108 16:43:16.249535497 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 7]  using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
[rank4]:[W1108 16:43:20.883546315 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 4]  using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Size of the train set: 10. Size of the validation set: 10
A sample of train dataset: {'content': "<|im_start|>user\nThese instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?<|im_end|>\n<|im_start|>assistant\nThis feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.<|im_end|>\n<|im_start|>user\nCan you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?<|im_end|>\n<|im_start|>assistant\nSure, here are the steps to enable the secondary image hover feature on your Collection pages and Featured Collections sections:\n\n1. Log in to your Shopify account and go to your Online Store.\n2. Click on Customize theme for the section-based theme you are using.\n3. Navigate to the Collection pages or Featured Collections section where you want to enable the secondary image hover feature.\n4. With the section open, click on the section's setting (gear) icon in the top-left corner.\n5. In the settings panel that appears, look for an option labeled 'Image display' or 'Image hover'.\n6. If available, select 'Show secondary image on hover'.\n7. Save the changes and preview the Collection/Featured Collection page to see the effect.\n\nIf you're having trouble finding the setting, the best thing to do is refer to your theme's documentation, since the location and labeling of settings can vary between themes.<|im_end|>\n<|im_start|>user\nCan you provide me with a link to the documentation for my theme?<|im_end|>\n<|im_start|>assistant\nI don't have access to your store's theme information. However, you can usually find the documentation for your theme by going to the shopify theme store, finding your theme and clicking on the 'support' link located in the bottom right corner of the page. Alternatively, you can do a google search for the name of your theme followed by 'documentation' or 'user guide'.<|im_end|>\n<|im_start|>user\nCan you confirm if this feature also works for the Quick Shop section of my theme?<|im_end|>\n<|im_start|>assistant\nThe secondary image hover feature may or may not work for your Quick Shop section, depending on the configuration of your theme. Some themes include this feature in the Quick Shop section by default, while others may require additional customization. To check if this feature is available for the Quick Shop section of your theme, follow these steps:\n\n1. Go to the Quick Shop section where you would like to enable the feature. 2. Click on the Quick Shop settings icon (gear icon) and look for 'Image display' or 'Image hover'. 3. If available, select 'Show secondary image on hover'. 4. Save the changes. If this option is not available in your Quick Shop section settings, you may need to reach out to your theme developer for assistance with customizing your Quick Shop section to include this feature.<|im_end|>\n"}
Generating train split: 8 examples [00:00, 291.34 examples/s]
Generating train split: 8 examples [00:00, 541.47 examples/s]
[rank0]:[W1108 16:43:55.752157448 ProcessGroupNCCL.cpp:4115] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
[2024-11-08 16:43:58,359] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
df: /root/.triton/autotune: No such file or directory
[2024-11-08 16:43:58,441] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,446] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,447] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,516] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,578] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,629] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 16:43:58,689] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Using auto half precision backend
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
PeftModelForCausalLM(
(base_model): LoraModel(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32008, 4096)
(layers): ModuleList(
(0-31): 32 x LlamaDecoderLayer(
(self_attn): LlamaFlashAttention2(
(q_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(k_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(v_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(o_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=11008, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=11008, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(up_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=11008, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=11008, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(down_proj): lora.Linear(
(base_layer): Linear(in_features=11008, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=11008, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
(post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
)
)
(norm): LlamaRMSNorm((4096,), eps=1e-05)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=4096, out_features=32008, bias=False)
)
)
)
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
trainable params: 19,988,480 || all params: 6,758,469,632 || trainable%: 0.2958
***** Running training *****
Num examples = 8
Num Epochs = 1
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 256
Gradient Accumulation steps = 4
Total optimization steps = 1
Number of trainable parameters = 2,498,560
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 2
wandb: You chose 'Use an existing W&B account'
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
wandb: Tracking run with wandb version 0.18.6
wandb: Run data is saved locally in /workspace/wandb/run-20241108_164934-d2cvs1zs
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run mistral-sft-lora-fsdp
wandb: ⭐️ View project at https://wandb.ai/a-amerehi/huggingface
wandb: 🚀 View run at https://wandb.ai/a-amerehi/huggingface/runs/d2cvs1zs
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.91s/it]/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
local_shape = tensor.shape
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:744: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.shape,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:746: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.dtype,
/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:747: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
tensor.device,
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
warnings.warn(
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/config.json
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
warnings.warn(
Model config LlamaConfig {
"_name_or_path": "meta-llama/Llama-2-7b-hf",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.47.0.dev0",
"use_cache": true,
"vocab_size": 32000
}
/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.

warnings.warn(

/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.

warnings.warn(

/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.

warnings.warn(

/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.

warnings.warn(

/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting save_embedding_layers to True as the embedding layer has been resized during finetuning.

warnings.warn(

/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.

dist_cp.save_state_dict(

/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.

dist_cp.save_state_dict(

/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.

dist_cp.save_state_dict(

/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.

dist_cp.save_state_dict(

/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.

dist_cp.save_state_dict(

/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.

dist_cp.save_state_dict(

/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.

dist_cp.save_state_dict(

/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py:108: FutureWarning: save_state_dict is deprecated and will be removed in future versions.Please use save instead.

dist_cp.save_state_dict(

[rank6]: Traceback (most recent call last):

[rank6]:   File "/workspace/train.py", line 155, in 

[rank6]:     main(model_args, data_args, training_args)

[rank6]:   File "/workspace/train.py", line 139, in main

[rank6]:     trainer.train(resume_from_checkpoint=checkpoint)

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train

[rank6]:     return inner_training_loop(

[rank6]:            ^^^^^^^^^^^^^^^^^^^^

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop

[rank6]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate

[rank6]:     self._save_checkpoint(model, trial)

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint

[rank6]:     self._save_optimizer_and_scheduler(output_dir)

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler

[rank6]:     save_fsdp_optimizer(

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer

[rank6]:     optim_state = FSDP.optim_state_dict(model, optimizer)

[rank6]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict

[rank6]:     return FullyShardedDataParallel._optim_state_dict_impl(

[rank6]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl

[rank6]:     return _optim_state_dict(

[rank6]:            ^^^^^^^^^^^^^^^^^^

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

[rank6]:     return func(*args, **kwargs)

[rank6]:            ^^^^^^^^^^^^^^^^^^^^^

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict

[rank6]:     fsdp_osd["param_groups"] = _unflatten_param_groups(

[rank6]:                                ^^^^^^^^^^^^^^^^^^^^^^^^

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups

[rank6]:     nested_unflat_param_names = [

[rank6]:                                 ^

[rank6]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in 

[rank6]:     param_to_fqns[param] for param in param_group_params

[rank6]:     ~~~~~~~~~~~~~^^^^^^^

[rank6]: KeyError: Parameter containing:

[rank6]: tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],

[rank6]:         [-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],

[rank6]:         [-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],

[rank6]:         ...,

[rank6]:         [ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],

[rank6]:         [-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],

[rank6]:         [ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],

[rank6]:        device='cuda:6', requires_grad=True)

[rank3]: Traceback (most recent call last):

[rank3]:   File "/workspace/train.py", line 155, in 

[rank3]:     main(model_args, data_args, training_args)

[rank3]:   File "/workspace/train.py", line 139, in main

[rank3]:     trainer.train(resume_from_checkpoint=checkpoint)

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train

[rank3]:     return inner_training_loop(

[rank3]:            ^^^^^^^^^^^^^^^^^^^^

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop

[rank3]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate

[rank3]:     self._save_checkpoint(model, trial)

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint

[rank3]:     self._save_optimizer_and_scheduler(output_dir)

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler

[rank3]:     save_fsdp_optimizer(

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer

[rank3]:     optim_state = FSDP.optim_state_dict(model, optimizer)

[rank3]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict

[rank3]:     return FullyShardedDataParallel._optim_state_dict_impl(

[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl

[rank3]:     return _optim_state_dict(

[rank3]:            ^^^^^^^^^^^^^^^^^^

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

[rank3]:     return func(*args, **kwargs)

[rank3]:            ^^^^^^^^^^^^^^^^^^^^^

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict

[rank3]:     fsdp_osd["param_groups"] = _unflatten_param_groups(

[rank3]:                                ^^^^^^^^^^^^^^^^^^^^^^^^

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups

[rank3]:     nested_unflat_param_names = [

[rank3]:                                 ^

[rank3]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in 

[rank3]:     param_to_fqns[param] for param in param_group_params

[rank3]:     ~~~~~~~~~~~~~^^^^^^^

[rank3]: KeyError: Parameter containing:

[rank3]: tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],

[rank3]:         [-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],

[rank3]:         [-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],

[rank3]:         ...,

[rank3]:         [ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],

[rank3]:         [-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],

[rank3]:         [ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],

[rank3]:        device='cuda:3', requires_grad=True)

[rank2]: Traceback (most recent call last):

[rank2]:   File "/workspace/train.py", line 155, in 

[rank2]:     main(model_args, data_args, training_args)

[rank2]:   File "/workspace/train.py", line 139, in main

[rank2]:     trainer.train(resume_from_checkpoint=checkpoint)

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train

[rank2]:     return inner_training_loop(

[rank2]:            ^^^^^^^^^^^^^^^^^^^^

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop

[rank2]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate

[rank2]:     self._save_checkpoint(model, trial)

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint

[rank2]:     self._save_optimizer_and_scheduler(output_dir)

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler

[rank2]:     save_fsdp_optimizer(

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer

[rank2]:     optim_state = FSDP.optim_state_dict(model, optimizer)

[rank2]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict

[rank2]:     return FullyShardedDataParallel._optim_state_dict_impl(

[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl

[rank2]:     return _optim_state_dict(

[rank2]:            ^^^^^^^^^^^^^^^^^^

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

[rank2]:     return func(*args, **kwargs)

[rank2]:            ^^^^^^^^^^^^^^^^^^^^^

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict

[rank2]:     fsdp_osd["param_groups"] = _unflatten_param_groups(

[rank2]:                                ^^^^^^^^^^^^^^^^^^^^^^^^

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups

[rank2]:     nested_unflat_param_names = [

[rank2]:                                 ^

[rank2]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in 

[rank2]:     param_to_fqns[param] for param in param_group_params

[rank2]:     ~~~~~~~~~~~~~^^^^^^^

[rank2]: KeyError: Parameter containing:

[rank2]: tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],

[rank2]:         [-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],

[rank2]:         [-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],

[rank2]:         ...,

[rank2]:         [ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],

[rank2]:         [-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],

[rank2]:         [ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],

[rank2]:        device='cuda:2', requires_grad=True)

[rank4]: Traceback (most recent call last):

[rank4]:   File "/workspace/train.py", line 155, in 

[rank4]:     main(model_args, data_args, training_args)

[rank4]:   File "/workspace/train.py", line 139, in main

[rank4]:     trainer.train(resume_from_checkpoint=checkpoint)

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train

[rank4]:     return inner_training_loop(

[rank4]:            ^^^^^^^^^^^^^^^^^^^^

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop

[rank4]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate

[rank4]:     self._save_checkpoint(model, trial)

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint

[rank4]:     self._save_optimizer_and_scheduler(output_dir)

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler

[rank4]:     save_fsdp_optimizer(

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer

[rank4]:     optim_state = FSDP.optim_state_dict(model, optimizer)

[rank4]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict

[rank4]:     return FullyShardedDataParallel._optim_state_dict_impl(

[rank4]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl

[rank4]:     return _optim_state_dict(

[rank4]:            ^^^^^^^^^^^^^^^^^^

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

[rank4]:     return func(*args, **kwargs)

[rank4]:            ^^^^^^^^^^^^^^^^^^^^^

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict

[rank4]:     fsdp_osd["param_groups"] = _unflatten_param_groups(

[rank4]:                                ^^^^^^^^^^^^^^^^^^^^^^^^

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups

[rank4]:     nested_unflat_param_names = [

[rank4]:                                 ^

[rank4]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in 

[rank4]:     param_to_fqns[param] for param in param_group_params

[rank4]:     ~~~~~~~~~~~~~^^^^^^^

[rank4]: KeyError: Parameter containing:

[rank4]: tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],

[rank4]:         [-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],

[rank4]:         [-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],

[rank4]:         ...,

[rank4]:         [ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],

[rank4]:         [-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],

[rank4]:         [ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],

[rank4]:        device='cuda:4', requires_grad=True)

[rank7]: Traceback (most recent call last):

[rank7]:   File "/workspace/train.py", line 155, in 

[rank7]:     main(model_args, data_args, training_args)

[rank7]:   File "/workspace/train.py", line 139, in main

[rank7]:     trainer.train(resume_from_checkpoint=checkpoint)

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train

[rank7]:     return inner_training_loop(

[rank7]:            ^^^^^^^^^^^^^^^^^^^^

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop

[rank7]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate

[rank7]:     self._save_checkpoint(model, trial)

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint

[rank7]:     self._save_optimizer_and_scheduler(output_dir)

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler

[rank7]:     save_fsdp_optimizer(

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer

[rank7]:     optim_state = FSDP.optim_state_dict(model, optimizer)

[rank7]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict

[rank7]:     return FullyShardedDataParallel._optim_state_dict_impl(

[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl

[rank7]:     return _optim_state_dict(

[rank7]:            ^^^^^^^^^^^^^^^^^^

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

[rank7]:     return func(*args, **kwargs)

[rank7]:            ^^^^^^^^^^^^^^^^^^^^^

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict

[rank7]:     fsdp_osd["param_groups"] = _unflatten_param_groups(

[rank7]:                                ^^^^^^^^^^^^^^^^^^^^^^^^

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups

[rank7]:     nested_unflat_param_names = [

[rank7]:                                 ^

[rank7]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in 

[rank7]:     param_to_fqns[param] for param in param_group_params

[rank7]:     ~~~~~~~~~~~~~^^^^^^^

[rank7]: KeyError: Parameter containing:

[rank7]: tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],

[rank7]:         [-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],

[rank7]:         [-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],

[rank7]:         ...,

[rank7]:         [ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],

[rank7]:         [-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],

[rank7]:         [ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],

[rank7]:        device='cuda:7', requires_grad=True)

[rank5]: Traceback (most recent call last):

[rank5]:   File "/workspace/train.py", line 155, in 

[rank5]:     main(model_args, data_args, training_args)

[rank5]:   File "/workspace/train.py", line 139, in main

[rank5]:     trainer.train(resume_from_checkpoint=checkpoint)

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train

[rank5]:     return inner_training_loop(

[rank5]:            ^^^^^^^^^^^^^^^^^^^^

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop

[rank5]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate

[rank5]:     self._save_checkpoint(model, trial)

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint

[rank5]:     self._save_optimizer_and_scheduler(output_dir)

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler

[rank5]:     save_fsdp_optimizer(

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer

[rank5]:     optim_state = FSDP.optim_state_dict(model, optimizer)

[rank5]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict

[rank5]:     return FullyShardedDataParallel._optim_state_dict_impl(

[rank5]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl

[rank5]:     return _optim_state_dict(

[rank5]:            ^^^^^^^^^^^^^^^^^^

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

[rank5]:     return func(*args, **kwargs)

[rank5]:            ^^^^^^^^^^^^^^^^^^^^^

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict

[rank5]:     fsdp_osd["param_groups"] = _unflatten_param_groups(

[rank5]:                                ^^^^^^^^^^^^^^^^^^^^^^^^

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups

[rank5]:     nested_unflat_param_names = [

[rank5]:                                 ^

[rank5]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in 

[rank5]:     param_to_fqns[param] for param in param_group_params

[rank5]:     ~~~~~~~~~~~~~^^^^^^^

[rank5]: KeyError: Parameter containing:

[rank5]: tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],

[rank5]:         [-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],

[rank5]:         [-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],

[rank5]:         ...,

[rank5]:         [ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],

[rank5]:         [-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],

[rank5]:         [ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],

[rank5]:        device='cuda:5', requires_grad=True)

Traceback (most recent call last):

File "/workspace/train.py", line 155, in 

main(model_args, data_args, training_args)

File "/workspace/train.py", line 139, in main

trainer.train(resume_from_checkpoint=checkpoint)

File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train

return inner_training_loop(

^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop

self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)

File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate

self._save_checkpoint(model, trial)

File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint

self._save_optimizer_and_scheduler(output_dir)

[rank1]: Traceback (most recent call last):

[rank1]:   File "/workspace/train.py", line 155, in 

[rank1]:     main(model_args, data_args, training_args)

[rank1]:   File "/workspace/train.py", line 139, in main

[rank1]:     trainer.train(resume_from_checkpoint=checkpoint)

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train

[rank1]:     return inner_training_loop(

[rank1]:            ^^^^^^^^^^^^^^^^^^^^

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop

[rank1]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate

[rank1]:     self._save_checkpoint(model, trial)

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint

[rank1]:     self._save_optimizer_and_scheduler(output_dir)

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler

[rank1]:     save_fsdp_optimizer(

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer

[rank1]:     optim_state = FSDP.optim_state_dict(model, optimizer)

[rank1]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict

[rank1]:     return FullyShardedDataParallel._optim_state_dict_impl(

[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl

[rank1]:     return _optim_state_dict(

[rank1]:            ^^^^^^^^^^^^^^^^^^

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

[rank1]:     return func(*args, **kwargs)

[rank1]:            ^^^^^^^^^^^^^^^^^^^^^

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict

[rank1]:     fsdp_osd["param_groups"] = _unflatten_param_groups(

[rank1]:                                ^^^^^^^^^^^^^^^^^^^^^^^^

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups

[rank1]:     nested_unflat_param_names = [

[rank1]:                                 ^

[rank1]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in 

[rank1]:     param_to_fqns[param] for param in param_group_params

[rank1]:     ~~~~~~~~~~~~~^^^^^^^

[rank1]: KeyError: Parameter containing:

[rank1]: tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],

[rank1]:         [-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],

[rank1]:         [-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],

[rank1]:         ...,

[rank1]:         [ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],

[rank1]:         [-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],

[rank1]:         [ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],

[rank1]:        device='cuda:1', requires_grad=True)

File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler

save_fsdp_optimizer(

File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer

optim_state = FSDP.optim_state_dict(model, optimizer)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict

return FullyShardedDataParallel._optim_state_dict_impl(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl

return _optim_state_dict(

^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict

fsdp_osd["param_groups"] = _unflatten_param_groups(

^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups

nested_unflat_param_names = [

^

File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in 

param_to_fqns[param] for param in param_group_params
KeyError: Parameter containing:
tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],
[-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],
[-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],
...,
[ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],
[-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],
[ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],
device='cuda:0', requires_grad=True)
[rank0]: Traceback (most recent call last):
[rank0]:   File "/workspace/train.py", line 155, in <module>
[rank0]:     main(model_args, data_args, training_args)
[rank0]:   File "/workspace/train.py", line 139, in main
[rank0]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2132, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2562, in _inner_training_loop
[rank0]:     self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3025, in _maybe_log_save_evaluate
[rank0]:     self._save_checkpoint(model, trial)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3160, in _save_checkpoint
[rank0]:     self._save_optimizer_and_scheduler(output_dir)
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3276, in _save_optimizer_and_scheduler
[rank0]:     save_fsdp_optimizer(
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/fsdp_utils.py", line 186, in save_fsdp_optimizer
[rank0]:     optim_state = FSDP.optim_state_dict(model, optimizer)
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1890, in optim_state_dict
[rank0]:     return FullyShardedDataParallel._optim_state_dict_impl(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1301, in _optim_state_dict_impl
[rank0]:     return _optim_state_dict(
[rank0]:            ^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 2015, in _optim_state_dict
[rank0]:     fsdp_osd["param_groups"] = _unflatten_param_groups(
[rank0]:                                ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1271, in _unflatten_param_groups
[rank0]:     nested_unflat_param_names = [
[rank0]:                                 ^
[rank0]:   File "/usr/local/lib/python3.11/dist-packages/torch/distributed/fsdp/_optim_utils.py", line 1272, in <listcomp>
[rank0]:     param_to_fqns[param] for param in param_group_params
[rank0]:     ~~~~~~~~~~~~~^^^^^^^
[rank0]: KeyError: Parameter containing:
[rank0]: tensor([[ 0.0007, -0.0035, -0.0132,  ...,  0.0048,  0.0075, -0.0131],
[rank0]:         [-0.0077,  0.0071,  0.0069,  ...,  0.0037,  0.0114, -0.0142],
[rank0]:         [-0.0058,  0.0103, -0.0030,  ..., -0.0134,  0.0156,  0.0019],
[rank0]:         ...,
[rank0]:         [ 0.0084,  0.0016, -0.0019,  ..., -0.0135, -0.0142, -0.0084],
[rank0]:         [-0.0133, -0.0083,  0.0022,  ..., -0.0101,  0.0025, -0.0026],
[rank0]:         [ 0.0148, -0.0037,  0.0084,  ..., -0.0073, -0.0091,  0.0124]],
[rank0]:        device='cuda:0', requires_grad=True)
wandb: 🚀 View run mistral-sft-lora-fsdp at: https://wandb.ai/a-amerehi/huggingface/runs/d2cvs1zs
wandb: Find logs at: wandb/run-20241108_164934-d2cvs1zs/logs
W1108 16:49:59.928000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2243 closing signal SIGTERM
W1108 16:49:59.930000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2244 closing signal SIGTERM
W1108 16:49:59.930000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2245 closing signal SIGTERM
W1108 16:49:59.931000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2247 closing signal SIGTERM
W1108 16:49:59.931000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2248 closing signal SIGTERM
W1108 16:49:59.931000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2249 closing signal SIGTERM
W1108 16:49:59.931000 2163 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 2250 closing signal SIGTERM
E1108 16:50:01.148000 2163 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 3 (pid: 2246) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 1155, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 793, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time      : 2024-11-08_16:49:59
host      : e3997253d925
rank      : 3 (local_rank: 3)
exitcode  : 1 (pid: 2246)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
   </pre>
</details>

Amerehei · 2024-11-08T19:23:56Z

I have one more question, why are there many warnings the log, specially deprecated warnings

Amerehei · 2024-11-12T13:16:41Z

@BenjaminBossan @qgallouedec
Any idea?

BenjaminBossan · 2024-11-12T22:40:05Z

Sorry for the delay in replying @Amerehei we're currently at a company offsite. Hopefully at the start of the next week, I'll have the opportunity to try to reproduce and will report back.

Amerehei · 2024-11-12T22:49:08Z

Thanks Benjamin for your response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: Parameter containing #2205

KeyError: Parameter containing #2205

Amerehei commented Nov 8, 2024

Amerehei commented Nov 8, 2024

Amerehei commented Nov 8, 2024

Amerehei commented Nov 12, 2024

BenjaminBossan commented Nov 12, 2024

Amerehei commented Nov 12, 2024

KeyError: Parameter containing #2205

KeyError: Parameter containing #2205

Comments

Amerehei commented Nov 8, 2024

Amerehei commented Nov 8, 2024

Amerehei commented Nov 8, 2024

Amerehei commented Nov 12, 2024

BenjaminBossan commented Nov 12, 2024

Amerehei commented Nov 12, 2024