Skip to content

Conversation

@The-truthh
Copy link
Contributor

What does this PR do?

Fixes # (issue)

Adds # (feature)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

@The-truthh The-truthh requested a review from vigo999 as a code owner October 30, 2025 07:15
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @The-truthh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces three new advanced models to the mindone.transformers library: d_fine, EfficientLoFTR, and GraniteMoeHybrid. The EfficientLoFTR model focuses on efficient keypoint matching for computer vision tasks, while the GraniteMoeHybrid model brings a cutting-edge hybrid Mamba and Attention architecture with Mixture-of-Experts to causal language modeling. These additions aim to broaden the library's capabilities and offer more diverse options for various AI applications.

Highlights

  • New Model Additions: Introduced three new transformer models: d_fine, EfficientLoFTR, and GraniteMoeHybrid, significantly expanding the model zoo within the mindone.transformers library.
  • EfficientLoFTR for Keypoint Matching: Added the EfficientLoFTR model, designed for semi-dense local feature matching with sparse-like speed, including its RepVGG backbone, local feature transformer, and fine fusion layer.
  • GraniteMoeHybrid for Causal Language Modeling: Implemented the GraniteMoeHybrid model, which features a novel hybrid architecture combining Mamba and Attention layers, along with a Mixture-of-Experts (MoE) gating mechanism for enhanced performance and efficiency in causal language modeling tasks.
  • Module Integration and Testing: Integrated the new models into the mindone.transformers module structure and included dedicated unit tests for the EfficientLoFTR model to ensure its correctness and compatibility.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces three new models: d_fine, granitemoehybrid, and efficientloftr. My review focuses on the implementation of these new models. I've identified a couple of critical issues in efficientloftr and granitemoehybrid that could cause runtime errors, particularly related to Grouped-Query Attention in EfficientLoFTRAttention and convolution cache initialization in GraniteMoeHybridMambaLayer. I have also found a minor copy-paste error in an error message. The changes are otherwise well-structured and follow the repository's patterns for adding new models.

is_cross_attention = encoder_hidden_states is not None
current_states = encoder_hidden_states if is_cross_attention else hidden_states

key_states = self.k_proj(current_states).view(batch_size, seq_len, -1, dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The reshaping of key_states on this line is incorrect when Grouped-Query Attention (GQA) is used (i.e., num_key_value_heads < num_attention_heads). The output of self.k_proj will have a size of num_key_value_heads * head_dim, which is smaller than dim (which is hidden_size). Consequently, the .view() operation will fail at runtime. The subsequent Rotary Position Embedding (RoPE) application is also incompatible with GQA as it expects query and key to have the same dimension as the position embeddings (hidden_size). This indicates that the attention layer does not correctly support GQA as suggested by the configuration options.

hidden_states_B_C_transposed = hidden_states_B_C.swapaxes(1, 2)
conv_states = mint.nn.functional.pad(
hidden_states_B_C_transposed, (self.conv_kernel_size - hidden_states_B_C_transposed.shape[-1], 0)
)
cache_params.conv_states[self.layer_idx].copy_(conv_states)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic for initializing the convolution cache is incorrect. On line 515, mint.nn.functional.pad is called with a padding size calculated as self.conv_kernel_size - hidden_states_B_C_transposed.shape[-1]. If the sequence length is greater than self.conv_kernel_size, this results in a negative padding size, which will cause a runtime error. The cache should be initialized with the last self.conv_kernel_size tokens of the input sequence, padding only if the sequence is shorter.

            if cache_params is not None:
                hidden_states_B_C_transposed = hidden_states_B_C.swapaxes(1, 2)
                seq_len = hidden_states_B_C_transposed.shape[-1]
                conv_kernel_size = self.conv_kernel_size
                if seq_len < conv_kernel_size:
                    conv_states = mint.nn.functional.pad(
                        hidden_states_B_C_transposed, (conv_kernel_size - seq_len, 0)
                    )
                else:
                    conv_states = hidden_states_B_C_transposed[..., -conv_kernel_size:]
                cache_params.conv_states[self.layer_idx].copy_(conv_states)

>>> outputs = model(**inputs)
```"""
if labels is not None:
raise ValueError("SuperGlue is not trainable, no labels should be provided.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message incorrectly refers to "SuperGlue" instead of "EfficientLoFTR". This appears to be a copy-paste error and could be confusing for users.

Suggested change
raise ValueError("SuperGlue is not trainable, no labels should be provided.")
raise ValueError("EfficientLoFTR is not trainable, no labels should be provided.")

@The-truthh The-truthh force-pushed the transformers-aimv2 branch 2 times, most recently from a0f245a to 5a12455 Compare October 30, 2025 07:45
@The-truthh The-truthh force-pushed the transformers-aimv2 branch 13 times, most recently from f8122a9 to 3241943 Compare November 3, 2025 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants