Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: MLLaMA-Vision Model Implementation with Candle #2773

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

elk-cloner
Copy link

Note: I'm new to Rust and actively learning while implementing this - feedback on idiomatic Rust patterns is very welcome! 🦀

Implementation Guide

Trying to implement meta-llama/Llama-3.2-11B-Vision by following the model conversion guide for pretrained model conversion.

Current Implementation Status

TODO

  • Vision Model Testing
    • Compare outputs with reference implementation link
  • Language Model Implementation
    • Investigate reusing existing LLaMA model with cross-attention layer
    • Potential future enhancements:
      • KV cache
      • Speculative decoding
      • Continuous batching
  • Language Model Testing
  • MLLaMAConditionalGeneration Implementation
  • End-to-End Model Testing (Vision + Language)
  • Code Refactoring for Idiomatic Rust

Questions/Discussion Points

  1. Can we leverage the existing LLaMA model implementation by adding cross-attention layer?
  2. Looking for guidance on making the code more idiomatic Rust

Help Needed

  • Rust best practices and idioms
  • Any insights on model architecture decisions
  • Testing strategies

I appreciate any feedback or guidance, especially around Rust implementations and ML architecture decisions!

@EricLBuehler
Copy link
Member

@elk-cloner this is super exciting! In case you are interested in a known-working implementation using Candle to help with any bugs, there is this: https://github.com/EricLBuehler/mistral.rs/tree/master/mistralrs-core/src/vision_models/mllama.

@elk-cloner
Copy link
Author

Thanks a lot, @EricLBuehler. I wasn't aware of that repo. Does it still make sense to have it here? I've just checked the link quickly and noticed that I'm doing the same thing here.

@LaurentMazare
Copy link
Collaborator

I think it's a good thing to add to candle-transforrmers if you have the time to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants