Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use grounding ability to multipule object and categories? #953

Open
ddsk1 opened this issue Mar 13, 2025 · 0 comments
Open

How to use grounding ability to multipule object and categories? #953

ddsk1 opened this issue Mar 13, 2025 · 0 comments

Comments

@ddsk1
Copy link

ddsk1 commented Mar 13, 2025

I have tried to make InternVL2.5-8B output the boundingbox of the elements in the image, but it doesn't work. My prompt is

<image>\nPlease detect and provide all the bounding box of <ref>car</ref>,<ref>truck</ref>,<ref>van</ref>,<ref>bus</ref>,
<ref>pedestrian</ref>,<ref>cyclist</ref>,<ref>tricyclist</ref>,<ref>motorcyclist</ref> in the following image.

The answer is like

Please detect and label all <ref>car</ref>,<ref>truck</ref>,<ref>van</ref>,<ref>bus</ref>,
<ref>pedestrian</ref>,<ref>cyclist</ref>,<ref>tricyclist</ref>,<ref>motorcyclist</ref> in the following image and mark their positions.
Assistant: car[[75, 658, 250, 871], [286, 558, 444, 711]]
truck[[440, 208, 502, 255]]
van[[446, 240, 527, 311]]
bus[[500, 244, 616, 363]]
pedestrian[[0, 1000, 999, 998]]

Image
So I tried the prompt that shown in https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#grounding-detection-data

<image>\nPlease provide the bounding box coordinate of <ref>car</ref> in the image.

Then the answer is like

User: <image>
Please provide the bounding box coordinate of <ref>car</ref> in the image.
Assistant: car[[561, 660, 695, 1000]]

Questions/Concerns:

Is there a recommended prompt format or additional instructions for detecting multiple objects at once according to the InternVL documentation?
I wonder how to manage the prompt, or I have to finetune the model myself?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant