-
Notifications
You must be signed in to change notification settings - Fork 244
Adds multimodal support #675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need working will mainly be here, first is to have the greedy untill function working
# TODO: What is the best option to pass images to the requests? | ||
# dirty hack for now | ||
for reqs in requests.values(): | ||
for req in reqs: | ||
req.specific = formatted_doc.specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the best option to pass images to the requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now it would be using specifics
but we could also add an images
field to the request that default to None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @NathanHB, thanks for reviewing the PR. As we discussed internally, the only thing left is the tests, and the rest can be merged in the current state.
I also plan to conduct more experiments to evaluate VLM models after my vacation, and maybe, we will already have some user feedback to improve the evaluation.
I also left some comments, feel free to resolve them (if needed) while adding tests 🤗 Thanks!
# We make sure the requests contain the tokenized versions of their values | ||
if any(r.tokenized_context is None for r in requests): | ||
raise ValueError("You passed a request for which tokenization had not happened yet.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be reverted?
toks = request.tokenized_context | ||
toks = request.context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this one as well?
# TODO: debug purpose, to remove later | ||
import os | ||
|
||
debug_samples = int(os.getenv("DATASET_SAMPLES", 0)) | ||
if debug_samples > 0: | ||
for dataset in datasets: | ||
for split in dataset.keys(): | ||
dataset[split] = dataset[split].select(range(debug_samples)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should be removed, is there a better way to run in debug mode? I can add it later in a follow-up PR
hey @qubvel Taking care of adding tests and removing unneded stuff, thank you so much for the help in adding this feature :) |
Aims to add multimodal support for transformers model by creating the
VLMTransformersModel
.Adds the MMMU task
modify the prompt manager to support multimodal input (images for now)
adds a
lighteval accelerate vlm
cli entry for creating the right config and using theVLMTransformersModel
tests failing is because of shortcut it's normal
To test / use: