-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: include zero value local_ssd_ephemeral_storage_count for GKE nodepool disktype=hyperdisk-balanced… #2474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… nodepool machine_type
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @shlomimn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where Terraform would incorrectly attempt to replace GKE nodepools configured with Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix an issue where Terraform produces a perpetual diff for GKE node pools using hyperdisk-balanced
disk type, because the GKE API expects the ephemeralStorageLocalSsdConfig
block even with zero counts. While the intent is correct, the implementation has a couple of issues. First, the change is made to a generated file (modules/beta-public-cluster/cluster.tf
) and should be applied to the source template (autogen/main/cluster.tf.tmpl
) to be effective and persistent. Second, the proposed logic >= 0
is too broad and will add the ephemeral_storage_local_ssd_config
block to all node pools, which could cause issues for configurations that do not support it. I've suggested a more targeted fix that applies the logic only when necessary. It would also be beneficial to add an integration test to cover this new scenario.
|
||
dynamic "ephemeral_storage_local_ssd_config" { | ||
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : [] | ||
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) >= 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should be applied to the template file autogen/main/cluster.tf.tmpl
instead of this generated file. Otherwise, it will be overwritten.
Additionally, the condition lookup(each.value, "local_ssd_ephemeral_storage_count", 0) >= 0
is too broad. Since lookup
with a default value of 0
will always return a number greater than or equal to zero, this will cause the ephemeral_storage_local_ssd_config
block to be created for all node pools, which may have unintended consequences for node pools that do not support this configuration.
Based on your description, the issue is specific to disk_type = "hyperdisk-balanced"
. A safer approach is to add this specific case to the condition, preserving the original logic for other cases.
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 || lookup(each.value, "disk_type", "pd-standard") == "hyperdisk-balanced" ? [1] : []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the suggested answer,
Since we are dealing with ssd and since it is suggested my solution is too broad,
the I think we can add the following change instead:
dynamic "ephemeral_storage_local_ssd_config" {
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) || lookup(each.value, "disk_type", "pd-ssd") == "hyperdisk-balanced" > 0 ? [1] : []
content {
local_ssd_count = lookup(each.value, "local_ssd_ephemeral_storage_count", 0)
data_cache_count = lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0)
}
}
…eation related to dick_type=hyperdisk-balanced to narrow the solution scope
Preview
I recently tried to create a new nodepool with machine_type=c4-highcpu-4 using a disk_type=hyperdisk-balanced
My GCP managed cluster controlplane version=1.32.8-gke.1134000
I am using terraform to add the new GKE nodepool.
Since my nodepool uses disk_type=hyperdisk-balanced (and not disk_type=pd-ssd),
the GKE API always assumes that the node pool might want ephemeral storage (since this is typically chosen for high-performance workloads), so it always exposes ephemeralStorageLocalSsdConfig: {} even if local_ssd_count = 0.
Problem
Following the preview above, the API wants to add ephemeralStorageLocalSsdConfig: {} when creating a new nodepool with disk_type=hyperdisk-balanced but then the GKE code checks whether local_ssd_count exists.
Since local_ssd_count doesn't exist in my nodepool configuration then terraform wants to omit ephemeralStorageLocalSsdConfig: {}
Relevant Code in cluster.tf
https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/main/modules/beta-private-cluster/cluster.tf
dynamic "ephemeral_storage_local_ssd_config" { for_each = **lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0** || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : [] content { local_ssd_count = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) data_cache_count = lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) } }
FInal Result
Finally the terraform codes shows: Plan: 1 to add, 0 to change, 1 to destroy.
`# module.gke.google_container_cluster.primary has changed ~ resource "google_container_cluster" "primary" {
...
...
...
...
Terraform will perform the following actions: # module.gke.google_container_node_pool.pools["event-proc"] must be replaced -/+ resource "google_container_node_pool" "pools" {
...
...
Suggested Solution
If add the option in the code not to remove dynamic "ephemeral_storage_local_ssd_config"
In case local_ssd_count=0 this will align with the API exposes ephemeralStorageLocalSsdConfig: {} even if local_ssd_count=0