Skip to content

Conversation

shlomimn
Copy link

Preview

I recently tried to create a new nodepool with machine_type=c4-highcpu-4 using a disk_type=hyperdisk-balanced
My GCP managed cluster controlplane version=1.32.8-gke.1134000
I am using terraform to add the new GKE nodepool.

Since my nodepool uses disk_type=hyperdisk-balanced (and not disk_type=pd-ssd),
the GKE API always assumes that the node pool might want ephemeral storage (since this is typically chosen for high-performance workloads), so it always exposes ephemeralStorageLocalSsdConfig: {} even if local_ssd_count = 0.

Problem

Following the preview above, the API wants to add ephemeralStorageLocalSsdConfig: {} when creating a new nodepool with disk_type=hyperdisk-balanced but then the GKE code checks whether local_ssd_count exists.
Since local_ssd_count doesn't exist in my nodepool configuration then terraform wants to omit ephemeralStorageLocalSsdConfig: {}

Relevant Code in cluster.tf

https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/main/modules/beta-private-cluster/cluster.tf
dynamic "ephemeral_storage_local_ssd_config" { for_each = **lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0** || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : [] content { local_ssd_count = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) data_cache_count = lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) } }

FInal Result

Finally the terraform codes shows: Plan: 1 to add, 0 to change, 1 to destroy.

`# module.gke.google_container_cluster.primary has changed ~ resource "google_container_cluster" "primary" {
...
...

  • ephemeral_storage_local_ssd_config { + local_ssd_count = 0 }
    ...
    ...
    Terraform will perform the following actions: # module.gke.google_container_node_pool.pools["event-proc"] must be replaced -/+ resource "google_container_node_pool" "pools" {
    ...
    ...
  • ephemeral_storage_local_ssd_config { # forces replacement - local_ssd_count = 0 -> null }
  • ...
  • ...
  • Plan: 1 to add, 0 to change, 1 to destroy.`

Suggested Solution

If add the option in the code not to remove dynamic "ephemeral_storage_local_ssd_config"
In case local_ssd_count=0 this will align with the API exposes ephemeralStorageLocalSsdConfig: {} even if local_ssd_count=0

@shlomimn shlomimn requested review from a team, apeabody and ericyz as code owners October 18, 2025 12:52
Copy link

google-cla bot commented Oct 18, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link

Summary of Changes

Hello @shlomimn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where Terraform would incorrectly attempt to replace GKE nodepools configured with hyperdisk-balanced disk types. The GKE API expects the ephemeralStorageLocalSsdConfig block to always be present for these nodepools, even if no local SSDs are specified (i.e., local_ssd_count is zero). Previously, Terraform's configuration only included this block if local_ssd_ephemeral_storage_count was greater than zero, leading to a discrepancy and unnecessary resource recreation. The change updates the conditional logic to include the configuration block when the count is zero or greater, ensuring proper alignment with the GKE API and preventing unwanted nodepool replacements.

Highlights

  • GKE Nodepool Configuration: Modified the condition for including ephemeral_storage_local_ssd_config in GKE nodepool configurations to allow for zero local_ssd_ephemeral_storage_count when using hyperdisk-balanced disk types, aligning with GKE API expectations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue where Terraform produces a perpetual diff for GKE node pools using hyperdisk-balanced disk type, because the GKE API expects the ephemeralStorageLocalSsdConfig block even with zero counts. While the intent is correct, the implementation has a couple of issues. First, the change is made to a generated file (modules/beta-public-cluster/cluster.tf) and should be applied to the source template (autogen/main/cluster.tf.tmpl) to be effective and persistent. Second, the proposed logic >= 0 is too broad and will add the ephemeral_storage_local_ssd_config block to all node pools, which could cause issues for configurations that do not support it. I've suggested a more targeted fix that applies the logic only when necessary. It would also be beneficial to add an integration test to cover this new scenario.


dynamic "ephemeral_storage_local_ssd_config" {
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : []
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) >= 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change should be applied to the template file autogen/main/cluster.tf.tmpl instead of this generated file. Otherwise, it will be overwritten.

Additionally, the condition lookup(each.value, "local_ssd_ephemeral_storage_count", 0) >= 0 is too broad. Since lookup with a default value of 0 will always return a number greater than or equal to zero, this will cause the ephemeral_storage_local_ssd_config block to be created for all node pools, which may have unintended consequences for node pools that do not support this configuration.

Based on your description, the issue is specific to disk_type = "hyperdisk-balanced". A safer approach is to add this specific case to the condition, preserving the original logic for other cases.

      for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 || lookup(each.value, "disk_type", "pd-standard") == "hyperdisk-balanced" ? [1] : []

Copy link
Author

@shlomimn shlomimn Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@shlomimn shlomimn Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the suggested answer,
Since we are dealing with ssd and since it is suggested my solution is too broad,
the I think we can add the following change instead:

dynamic "ephemeral_storage_local_ssd_config" {
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) || lookup(each.value, "disk_type", "pd-ssd") == "hyperdisk-balanced" > 0 ? [1] : []
content {
local_ssd_count = lookup(each.value, "local_ssd_ephemeral_storage_count", 0)
data_cache_count = lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0)
}
}

@shlomimn shlomimn changed the title fix: include zero value local_ssd_ephemeral_storage_count for highcpu… fix: include zero value local_ssd_ephemeral_storage_count for disktype=hyperdisk-balanced… Oct 18, 2025
@shlomimn shlomimn changed the title fix: include zero value local_ssd_ephemeral_storage_count for disktype=hyperdisk-balanced… fix: include zero value local_ssd_ephemeral_storage_count for GKE nodepool disktype=hyperdisk-balanced… Oct 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant