-
Notifications
You must be signed in to change notification settings - Fork 87
feat(transformers/models): add models of d_fine, granitemoehybrid and efficientloftr #1405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @The-truthh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces three new advanced models to the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces three new models: d_fine, granitemoehybrid, and efficientloftr. My review focuses on the implementation of these new models. I've identified a couple of critical issues in efficientloftr and granitemoehybrid that could cause runtime errors, particularly related to Grouped-Query Attention in EfficientLoFTRAttention and convolution cache initialization in GraniteMoeHybridMambaLayer. I have also found a minor copy-paste error in an error message. The changes are otherwise well-structured and follow the repository's patterns for adding new models.
| is_cross_attention = encoder_hidden_states is not None | ||
| current_states = encoder_hidden_states if is_cross_attention else hidden_states | ||
|
|
||
| key_states = self.k_proj(current_states).view(batch_size, seq_len, -1, dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reshaping of key_states on this line is incorrect when Grouped-Query Attention (GQA) is used (i.e., num_key_value_heads < num_attention_heads). The output of self.k_proj will have a size of num_key_value_heads * head_dim, which is smaller than dim (which is hidden_size). Consequently, the .view() operation will fail at runtime. The subsequent Rotary Position Embedding (RoPE) application is also incompatible with GQA as it expects query and key to have the same dimension as the position embeddings (hidden_size). This indicates that the attention layer does not correctly support GQA as suggested by the configuration options.
| hidden_states_B_C_transposed = hidden_states_B_C.swapaxes(1, 2) | ||
| conv_states = mint.nn.functional.pad( | ||
| hidden_states_B_C_transposed, (self.conv_kernel_size - hidden_states_B_C_transposed.shape[-1], 0) | ||
| ) | ||
| cache_params.conv_states[self.layer_idx].copy_(conv_states) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for initializing the convolution cache is incorrect. On line 515, mint.nn.functional.pad is called with a padding size calculated as self.conv_kernel_size - hidden_states_B_C_transposed.shape[-1]. If the sequence length is greater than self.conv_kernel_size, this results in a negative padding size, which will cause a runtime error. The cache should be initialized with the last self.conv_kernel_size tokens of the input sequence, padding only if the sequence is shorter.
if cache_params is not None:
hidden_states_B_C_transposed = hidden_states_B_C.swapaxes(1, 2)
seq_len = hidden_states_B_C_transposed.shape[-1]
conv_kernel_size = self.conv_kernel_size
if seq_len < conv_kernel_size:
conv_states = mint.nn.functional.pad(
hidden_states_B_C_transposed, (conv_kernel_size - seq_len, 0)
)
else:
conv_states = hidden_states_B_C_transposed[..., -conv_kernel_size:]
cache_params.conv_states[self.layer_idx].copy_(conv_states)| >>> outputs = model(**inputs) | ||
| ```""" | ||
| if labels is not None: | ||
| raise ValueError("SuperGlue is not trainable, no labels should be provided.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message incorrectly refers to "SuperGlue" instead of "EfficientLoFTR". This appears to be a copy-paste error and could be confusing for users.
| raise ValueError("SuperGlue is not trainable, no labels should be provided.") | |
| raise ValueError("EfficientLoFTR is not trainable, no labels should be provided.") |
a0f245a to
5a12455
Compare
f8122a9 to
3241943
Compare
3241943 to
0a78b8f
Compare
What does this PR do?
Fixes # (issue)
Adds # (feature)
Before submitting
What's New. Here are thedocumentation guidelines
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx