-
Notifications
You must be signed in to change notification settings - Fork 30.3k
Open
Labels
Description
System Info
transformers 4.54.0
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers.models.llava_next import LlavaNextForConditionalGeneration, LlavaNextProcessor
from PIL import Image
import requests
import torch
model = LlavaNextForConditionalGeneration.from_pretrained(
"llava-hf/llava-v1.6-vicuna-7b-hf",
vision_feature_select_strategy="full",
torch_dtype=torch.float16,
device_map="auto",
)
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-vicuna-7b-hf")
image = Image.open("/data/coco/train2017/000000000009.jpg")
prompt = "USER: <image>\nWhat is shown in this image? ASSISTANT:"
inputs = processor(images=image, text=prompt, truncation=True, return_tensors="pt", vision_feature_select_strategy = "full").to("cuda")
input_embeds = model(inputs.input_ids, pixel_values=inputs.pixel_values, image_sizes=inputs.image_sizes, vision_feature_select_strategy="full")
Expected behavior
I encountered a bug when running to the line
input_embeds = model(inputs.input_ids, pixel_values=inputs.pixel_values, image_sizes=inputs.image_sizes, vision_feature_select_strategy="full")
I got:
in pack_image_features
image_feature = image_feature.view(num_patch_height, num_patch_width, height, width, -1)
RuntimeError: shape '[2, 2, 24, 24, -1]' is invalid for input of size 9453568
the shape of image_feature is [4, 577, 4096] currently, I want to know how to fix this?