You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried to make InternVL2.5-8B output the boundingbox of the elements in the image, but it doesn't work. My prompt is
<image>\nPlease detect and provide all the bounding box of <ref>car</ref>,<ref>truck</ref>,<ref>van</ref>,<ref>bus</ref>,
<ref>pedestrian</ref>,<ref>cyclist</ref>,<ref>tricyclist</ref>,<ref>motorcyclist</ref> in the following image.
The answer is like
Please detect and label all <ref>car</ref>,<ref>truck</ref>,<ref>van</ref>,<ref>bus</ref>,
<ref>pedestrian</ref>,<ref>cyclist</ref>,<ref>tricyclist</ref>,<ref>motorcyclist</ref> in the following image and mark their positions.
Assistant: car[[75, 658, 250, 871], [286, 558, 444, 711]]
truck[[440, 208, 502, 255]]
van[[446, 240, 527, 311]]
bus[[500, 244, 616, 363]]
pedestrian[[0, 1000, 999, 998]]
<image>\nPlease provide the bounding box coordinate of <ref>car</ref> in the image.
Then the answer is like
User: <image>
Please provide the bounding box coordinate of <ref>car</ref> in the image.
Assistant: car[[561, 660, 695, 1000]]
Questions/Concerns:
Is there a recommended prompt format or additional instructions for detecting multiple objects at once according to the InternVL documentation?
I wonder how to manage the prompt, or I have to finetune the model myself?
The text was updated successfully, but these errors were encountered:
I have tried to make InternVL2.5-8B output the boundingbox of the elements in the image, but it doesn't work. My prompt is
The answer is like
So I tried the prompt that shown in https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#grounding-detection-data
Then the answer is like
Questions/Concerns:
Is there a recommended prompt format or additional instructions for detecting multiple objects at once according to the InternVL documentation?
I wonder how to manage the prompt, or I have to finetune the model myself?
The text was updated successfully, but these errors were encountered: