Skip to content

Qwen2.5 VL fail to train due to qwen-vl-utils #227

@David-rn

Description

@David-rn

Search before asking

  • I have searched the Multimodal Maestro issues and found no similar bug report.

Bug

Hi! First of all thank you for this amazing library! 🔥

I was trying to fine-tune the Qwen2.5VL following the colab example when the following error appeared:

File "/home/dredo/anaconda3/envs/occluders/lib/python3.12/site-packages/maestro/trainer/models/qwen_2_5_vl/detection.py", line 14, in detections_to_suffix_formatter
    input_h, input_w = smart_resize(height=image_h, width=image_w, min_pixels=min_pixels, max_pixels=max_pixels)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: smart_resize() missing 1 required positional argument: 'factor'

I realized that qwen-vl-utils library deleted the deafult value of factor yesterday in this commit. I didn't find why.

The error will only appear when loading a coco dataset as it calls the smart_resize function from detections_to_suffix_formatter.

Workaround

  • qwen-vl-utils <= 0.0.11 worked for me

  • Add this parameter in the function with the default (IMAGE_FACTOR = 28) value they used to have

Environment

  • maestro: maestro[qwen_2_5_vl]==1.1.0rc3
  • OS: Ubuntu 22.04
  • Python: 3.12

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions