-
Notifications
You must be signed in to change notification settings - Fork 254
Add a variable to skip the machine-type validation for TPUs #4766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a variable to skip the machine-type validation for TPUs #4766
Conversation
Summary of ChangesHello @agrawalkhushi18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where GKE node pools utilizing aggregate TPU reservations would fail during the Terraform plan stage due to a missing Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces a mechanism to bypass machine-type validation for TPU node pools by checking the placement_policy.tpu_topology
setting. This addresses an issue where the existing validation logic fails for TPUs due to the absence of a machine-type field in TPU reservations. The changes involve adding a local variable is_tpu
to determine if the node pool is a TPU and conditionally skipping the machine-type validation based on this variable. The code changes look reasonable, and the comments are helpful. I have added a review comment to improve the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
In long term, we may need a robust way to check TPU specific reservations attributes. This may require the corresponding tf data sources to expose the tpu specific properties.
The current PR-test failures appear to be unrelated to the changes introduced in this PR. Merging it after approval. |
Agreed. This long-term enhancement will be tracked as a child task to expose the necessary TPU reservation properties. |
35e73d1
into
GoogleCloudPlatform:develop
…blueprint Add a variable to skip the machine-type validation for TPUs
This change bypasses the node pool's machine-type validation when using aggregate TPU reservations. The existing precondition block fails for TPUs because these reservations do not expose a machine-type field, unlike CPU or GPU reservations, which causes a mismatch error during the terraform plan stage. This fix uses the
placement_policy.tpu_topology
setting to reliably identify a TPU node pool and conditionally skip this specific validation, unblocking the deployment.Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.