pack_image_features RuntimeError when vision_feature_select_strategy="full"

### System Info

transformers  4.54.0

### Who can help?

@zucchini-nlp 

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```
from transformers.models.llava_next import LlavaNextForConditionalGeneration, LlavaNextProcessor
from PIL import Image
import requests
import torch

model = LlavaNextForConditionalGeneration.from_pretrained(
                "llava-hf/llava-v1.6-vicuna-7b-hf", 
                vision_feature_select_strategy="full",
                torch_dtype=torch.float16,
                device_map="auto",
            )
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-vicuna-7b-hf")

image = Image.open("/data/coco/train2017/000000000009.jpg")
prompt = "USER: <image>\nWhat is shown in this image? ASSISTANT:"
inputs = processor(images=image, text=prompt, truncation=True, return_tensors="pt", vision_feature_select_strategy = "full").to("cuda")

input_embeds = model(inputs.input_ids, pixel_values=inputs.pixel_values, image_sizes=inputs.image_sizes, vision_feature_select_strategy="full")
```

### Expected behavior

I encountered a bug when running to the line 
`input_embeds = model(inputs.input_ids, pixel_values=inputs.pixel_values, image_sizes=inputs.image_sizes, vision_feature_select_strategy="full")`
I got:
```
 in pack_image_features
    image_feature = image_feature.view(num_patch_height, num_patch_width, height, width, -1)
RuntimeError: shape '[2, 2, 24, 24, -1]' is invalid for input of size 9453568
```


the shape of image_feature is [4, 577, 4096] currently, I want to know how to fix this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pack_image_features RuntimeError when vision_feature_select_strategy="full" #39839

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pack_image_features RuntimeError when vision_feature_select_strategy="full" #39839

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions