You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
运行sh shell/internvl2.5/2nd_finetune/internvl2_5_4b_dynamic_res_2nd_finetune_lora.sh对模型微调,不能直接按照 path = './InternVL2_5-4B-lora' model = AutoModel.from_pretrained( path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, use_flash_attn=True, trust_remote_code=True).eval().cuda() tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
加载模型,报错NotImplementedError: Cannot copy out of meta tensor; no data!。我按照Enhancing InternVL2 on COCO Caption Using LoRA Fine-Tuning中的内容将Lora参数合并,也有同样的报错。
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "~/InternVL_process/inference_with_transformers.py", line 160, in<module>
model = AutoModel.from_pretrained(
File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2567, in cuda
returnsuper().cuda(*args, **kwargs)
File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 918, in cuda
return self._apply(lambda t: t.cuda(device))
File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "~/miniconda3/envs/internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 918, in<lambda>return self._apply(lambda t: t.cuda(device))
NotImplementedError: Cannot copy out of meta tensor; no data!
The text was updated successfully, but these errors were encountered:
Checklist
Describe the bug
运行
sh shell/internvl2.5/2nd_finetune/internvl2_5_4b_dynamic_res_2nd_finetune_lora.sh
对模型微调,不能直接按照path = './InternVL2_5-4B-lora' model = AutoModel.from_pretrained( path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, use_flash_attn=True, trust_remote_code=True).eval().cuda() tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
加载模型,报错
NotImplementedError: Cannot copy out of meta tensor; no data!
。我按照Enhancing InternVL2 on COCO Caption Using LoRA Fine-Tuning中的内容将Lora参数合并,也有同样的报错。我在模型合并中,加载模型的
InternVLChatModel.from_pretrained
参数中添加device_map="auto"
可以merge,但是推理速度远低于未微调的模型Reproduction
set -x
GPUS=${GPUS:-4}
BATCH_SIZE=${BATCH_SIZE:-4}
PER_DEVICE_BATCH_SIZE=${PER_DEVICE_BATCH_SIZE:-1}
GRADIENT_ACC=$((BATCH_SIZE / PER_DEVICE_BATCH_SIZE / GPUS))
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
export MASTER_PORT=34229
export TF_CPP_MIN_LOG_LEVEL=3
export LAUNCHER=pytorch
OUTPUT_DIR='./internvl_checkpoints/internvl_chat_v2_5/internvl2_5_4b_dynamic_res_2nd_finetune_lora'
if [ ! -d "$OUTPUT_DIR" ]; then
mkdir -p "$OUTPUT_DIR"
fi
number of gpus: 2
batch size per gpu: 4
gradient accumulation steps: 2
total batch size: 16
epoch: 1
torchrun
--nnodes=1
--node_rank=0
--master_addr=127.0.0.1
--nproc_per_node=${GPUS}
--master_port=${MASTER_PORT}
internvl/train/internvl_chat_finetune.py
--model_name_or_path "OpenGVLab/InternVL2_5-4B"
--conv_style "internvl2_5"
--use_fast_tokenizer False
--output_dir ${OUTPUT_DIR}
--meta_path "./InternVL_process/finetune_lora_json/ng-4-1-ok-4-1-bbox.json"
--overwrite_output_dir True
--force_image_size 448
--max_dynamic_patch 6
--down_sample_ratio 0.5
--drop_path_rate 0.0
--freeze_llm True
--freeze_mlp True
--freeze_backbone True
--use_llm_lora 16
--vision_select_layer -1
--dataloader_num_workers 4
--bf16 True
--num_train_epochs 1
--per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE}
--gradient_accumulation_steps ${GRADIENT_ACC}
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 200
--save_total_limit 1
--learning_rate 4e-5
--weight_decay 0.01
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--max_seq_length 8192
--do_train True
--grad_checkpoint True
--group_by_length True
--dynamic_image_size True
--use_thumbnail True
--ps_version 'v2'
--deepspeed "zero_stage1_config.json"
--report_to "tensorboard"
2>&1 | tee -a "${OUTPUT_DIR}/training_log.txt"
Environment
Error traceback
The text was updated successfully, but these errors were encountered: