Skip to content

Adds multimodal support #675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open

Adds multimodal support #675

wants to merge 39 commits into from

Conversation

NathanHB
Copy link
Member

@NathanHB NathanHB commented Apr 15, 2025

Aims to add multimodal support for transformers model by creating the VLMTransformersModel.

  • Adds the MMMU task

  • modify the prompt manager to support multimodal input (images for now)

  • adds a lighteval accelerate vlm cli entry for creating the right config and using the VLMTransformersModel

  • tests failing is because of shortcut it's normal

To test / use:

uv run lighteval accelerate "model_name=HuggingFaceTB/SmolVLM-Instruct" "lighteval|mmmu_pro|0|0" --use-chat-template

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need working will mainly be here, first is to have the greedy untill function working

Comment on lines 498 to 502
# TODO: What is the best option to pass images to the requests?
# dirty hack for now
for reqs in requests.values():
for req in reqs:
req.specific = formatted_doc.specific
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the best option to pass images to the requests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now it would be using specifics but we could also add an images field to the request that default to None

@NathanHB NathanHB added the feature/enhancement New feature/request label May 5, 2025
Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @NathanHB, thanks for reviewing the PR. As we discussed internally, the only thing left is the tests, and the rest can be merged in the current state.

I also plan to conduct more experiments to evaluate VLM models after my vacation, and maybe, we will already have some user feedback to improve the evaluation.

I also left some comments, feel free to resolve them (if needed) while adding tests 🤗 Thanks!

Comment on lines -70 to -73
# We make sure the requests contain the tokenized versions of their values
if any(r.tokenized_context is None for r in requests):
raise ValueError("You passed a request for which tokenization had not happened yet.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be reverted?

Comment on lines -273 to +269
toks = request.tokenized_context
toks = request.context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this one as well?

Comment on lines +581 to +589
# TODO: debug purpose, to remove later
import os

debug_samples = int(os.getenv("DATASET_SAMPLES", 0))
if debug_samples > 0:
for dataset in datasets:
for split in dataset.keys():
dataset[split] = dataset[split].select(range(debug_samples))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be removed, is there a better way to run in debug mode? I can add it later in a follow-up PR

@NathanHB
Copy link
Member Author

hey @qubvel Taking care of adding tests and removing unneded stuff, thank you so much for the help in adding this feature :)

@NathanHB NathanHB linked an issue May 15, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/enhancement New feature/request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FT] Add multimodal for transformers models
3 participants