feat(transformers/models): add models of d_fine, granitemoehybrid and efficientloftr #1405

The-truthh · 2025-10-30T07:15:15Z

What does this PR do?

Fixes # (issue)

Adds # (feature)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

gemini-code-assist · 2025-10-30T07:15:35Z

Summary of Changes

Hello @The-truthh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces three new advanced models to the mindone.transformers library: d_fine, EfficientLoFTR, and GraniteMoeHybrid. The EfficientLoFTR model focuses on efficient keypoint matching for computer vision tasks, while the GraniteMoeHybrid model brings a cutting-edge hybrid Mamba and Attention architecture with Mixture-of-Experts to causal language modeling. These additions aim to broaden the library's capabilities and offer more diverse options for various AI applications.

Highlights

New Model Additions: Introduced three new transformer models: d_fine, EfficientLoFTR, and GraniteMoeHybrid, significantly expanding the model zoo within the mindone.transformers library.
EfficientLoFTR for Keypoint Matching: Added the EfficientLoFTR model, designed for semi-dense local feature matching with sparse-like speed, including its RepVGG backbone, local feature transformer, and fine fusion layer.
GraniteMoeHybrid for Causal Language Modeling: Implemented the GraniteMoeHybrid model, which features a novel hybrid architecture combining Mamba and Attention layers, along with a Mixture-of-Experts (MoE) gating mechanism for enhanced performance and efficiency in causal language modeling tasks.
Module Integration and Testing: Integrated the new models into the mindone.transformers module structure and included dedicated unit tests for the EfficientLoFTR model to ensure its correctness and compatibility.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces three new models: d_fine, granitemoehybrid, and efficientloftr. My review focuses on the implementation of these new models. I've identified a couple of critical issues in efficientloftr and granitemoehybrid that could cause runtime errors, particularly related to Grouped-Query Attention in EfficientLoFTRAttention and convolution cache initialization in GraniteMoeHybridMambaLayer. I have also found a minor copy-paste error in an error message. The changes are otherwise well-structured and follow the repository's patterns for adding new models.

gemini-code-assist · 2025-10-30T07:19:35Z

mindone/transformers/models/efficientloftr/modeling_efficientloftr.py

+        is_cross_attention = encoder_hidden_states is not None
+        current_states = encoder_hidden_states if is_cross_attention else hidden_states
+
+        key_states = self.k_proj(current_states).view(batch_size, seq_len, -1, dim)


The reshaping of key_states on this line is incorrect when Grouped-Query Attention (GQA) is used (i.e., num_key_value_heads < num_attention_heads). The output of self.k_proj will have a size of num_key_value_heads * head_dim, which is smaller than dim (which is hidden_size). Consequently, the .view() operation will fail at runtime. The subsequent Rotary Position Embedding (RoPE) application is also incompatible with GQA as it expects query and key to have the same dimension as the position embeddings (hidden_size). This indicates that the attention layer does not correctly support GQA as suggested by the configuration options.

gemini-code-assist · 2025-10-30T07:19:35Z

mindone/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py

+                hidden_states_B_C_transposed = hidden_states_B_C.swapaxes(1, 2)
+                conv_states = mint.nn.functional.pad(
+                    hidden_states_B_C_transposed, (self.conv_kernel_size - hidden_states_B_C_transposed.shape[-1], 0)
+                )
+                cache_params.conv_states[self.layer_idx].copy_(conv_states)


The logic for initializing the convolution cache is incorrect. On line 515, mint.nn.functional.pad is called with a padding size calculated as self.conv_kernel_size - hidden_states_B_C_transposed.shape[-1]. If the sequence length is greater than self.conv_kernel_size, this results in a negative padding size, which will cause a runtime error. The cache should be initialized with the last self.conv_kernel_size tokens of the input sequence, padding only if the sequence is shorter.

if cache_params is not None: hidden_states_B_C_transposed = hidden_states_B_C.swapaxes(1, 2) seq_len = hidden_states_B_C_transposed.shape[-1] conv_kernel_size = self.conv_kernel_size if seq_len < conv_kernel_size: conv_states = mint.nn.functional.pad( hidden_states_B_C_transposed, (conv_kernel_size - seq_len, 0) ) else: conv_states = hidden_states_B_C_transposed[..., -conv_kernel_size:] cache_params.conv_states[self.layer_idx].copy_(conv_states)

gemini-code-assist · 2025-10-30T07:19:35Z

mindone/transformers/models/efficientloftr/modeling_efficientloftr.py

+        >>> outputs = model(**inputs)
+        ```"""
+        if labels is not None:
+            raise ValueError("SuperGlue is not trainable, no labels should be provided.")


The error message incorrectly refers to "SuperGlue" instead of "EfficientLoFTR". This appears to be a copy-paste error and could be confusing for users.

Suggested change

raise ValueError("SuperGlue is not trainable, no labels should be provided.")

raise ValueError("EfficientLoFTR is not trainable, no labels should be provided.")

… efficientloftr

The-truthh requested a review from vigo999 as a code owner October 30, 2025 07:15

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

The-truthh force-pushed the transformers-aimv2 branch 2 times, most recently from a0f245a to 5a12455 Compare October 30, 2025 07:45

add hgnet_v2

608f742

The-truthh force-pushed the transformers-aimv2 branch 13 times, most recently from f8122a9 to 3241943 Compare November 3, 2025 06:48

feat(transformers/models): add models of d_fine, granitemoehybrid and…

0a78b8f

… efficientloftr

The-truthh force-pushed the transformers-aimv2 branch from 3241943 to 0a78b8f Compare November 3, 2025 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(transformers/models): add models of d_fine, granitemoehybrid and efficientloftr #1405

feat(transformers/models): add models of d_fine, granitemoehybrid and efficientloftr #1405

Uh oh!

The-truthh commented Oct 30, 2025

Uh oh!

gemini-code-assist bot commented Oct 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Uh oh!

gemini-code-assist bot Oct 30, 2025

Uh oh!

gemini-code-assist bot Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	raise ValueError("SuperGlue is not trainable, no labels should be provided.")
	raise ValueError("EfficientLoFTR is not trainable, no labels should be provided.")

feat(transformers/models): add models of d_fine, granitemoehybrid and efficientloftr #1405

Are you sure you want to change the base?

feat(transformers/models): add models of d_fine, granitemoehybrid and efficientloftr #1405

Uh oh!

Conversation

The-truthh commented Oct 30, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

gemini-code-assist bot commented Oct 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants