-
Notifications
You must be signed in to change notification settings - Fork 87
feat(transformers): add T5Gemma (v4.54.1) #1420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @alien-0119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the T5Gemma model, including its configuration, model definition, and tests. The implementation looks solid, but there are a few areas for improvement. I've pointed out some issues related to code consistency, such as alphabetical ordering of imports and mappings. There's also a potential bug in the model parallelism configuration (_no_split_modules) and some code duplication that could be refactored. Additionally, I've suggested replacing a wildcard import with explicit imports for better code clarity.
| config: T5GemmaConfig | ||
| base_model_prefix = "model" | ||
| supports_gradient_checkpointing = True | ||
| _no_split_modules = ["T5GemmaBlock"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _no_split_modules list contains "T5GemmaBlock", but this class is not defined in the file. This will likely cause issues with model parallelism. It seems it should refer to the layer classes, like T5GemmaEncoderLayer and T5GemmaDecoderLayer.
| _no_split_modules = ["T5GemmaBlock"] | |
| _no_split_modules = ["T5GemmaEncoderLayer", "T5GemmaDecoderLayer"] |
| ("vipllava", "VipLlavaForConditionalGeneration"), | ||
| ("visual_bert", "VisualBertForPreTraining"), | ||
| ("vit_mae", "ViTMAEForPreTraining"), | ||
| ("t5gemma", "T5GemmaForConditionalGeneration"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new entries for t5gemma are not in alphabetical order in several mapping lists (MODEL_FOR_PRETRAINING_MAPPING_NAMES, MODEL_WITH_LM_HEAD_MAPPING_NAMES, MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES). To maintain consistency within the file, please place them in their correct alphabetical positions, usually after the corresponding t5 entry.
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from .modeling_t5gemma import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wildcard imports (*) are discouraged as they can pollute the namespace and make it unclear which names are being imported. Please use explicit imports and define __all__ to control what is exported from the module. It is also good practice to keep the imported names and __all__ list sorted alphabetically.
| from .modeling_t5gemma import * | |
| from .modeling_t5gemma import ( | |
| T5GemmaEncoderModel, | |
| T5GemmaForConditionalGeneration, | |
| T5GemmaForSequenceClassification, | |
| T5GemmaForTokenClassification, | |
| T5GemmaModel, | |
| T5GemmaPreTrainedModel, | |
| ) | |
| __all__ = [ | |
| "T5GemmaEncoderModel", | |
| "T5GemmaForConditionalGeneration", | |
| "T5GemmaForSequenceClassification", | |
| "T5GemmaForTokenClassification", | |
| "T5GemmaModel", | |
| "T5GemmaPreTrainedModel", | |
| ] |
| _can_record_outputs = { | ||
| "hidden_states": T5GemmaDecoderLayer, | ||
| "attentions": T5GemmaAttention, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The T5GemmaAttention class is used here for recording attention outputs, but it appears to be a redundant implementation of self-attention, very similar to T5GemmaSelfAttention. Other parts of the model like T5GemmaEncoder already use T5GemmaSelfAttention for this purpose. To reduce code duplication and improve maintainability, consider using T5GemmaSelfAttention here and removing the T5GemmaAttention class.
| _can_record_outputs = { | |
| "hidden_states": T5GemmaDecoderLayer, | |
| "attentions": T5GemmaAttention, | |
| } | |
| _can_record_outputs = { | |
| "hidden_states": T5GemmaDecoderLayer, | |
| "attentions": T5GemmaSelfAttention, | |
| } |
d45c3bb to
0c8ddbb
Compare
0c8ddbb to
0855aa9
Compare
What does this PR do?
Adds # (feature)
Add T5Gemma model and fast ut.
Rely on #1412
Usage Example:
Performance:
Experiments were tested on Ascend Atlas 800T A2 machines with mindspore 2.7.0 pynative mode.
Before submitting
What's New. Here are thedocumentation guidelines
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx