Skip to content

Node pool forced replacement due to GCP API regional inconsistency with ephemeralStorageLocalSsdConfig #2465

@patjlm

Description

@patjlm

TL;DR

Node pools with C3 machine types in europe-west1 show perpetual drift and forced replacements on every terraform plan, while identical configurations in us-central1 work correctly. The GCP API returns different values for ephemeralStorageLocalSsdConfig based on region + machine type combination, and the module does not handle this inconsistency gracefully. Reproduced with modules/private-cluster, but other cluster modules may be affected as well.

Expected behavior

After terraform apply successfully creates a GKE cluster with node pools, running terraform plan should show no changes.

Observed behavior

  • us-central1: terraform plan shows no changes
  • europe-west1: terraform plan forces node pool replacement
# module.gke.google_container_node_pool.pools["pool"] must be replaced
  ~ node_config {
      - ephemeral_storage_local_ssd_config {  # forces replacement
          - data_cache_count = 0 -> null
          - local_ssd_count  = 0 -> null
        }
    }

Root Cause Verified via Direct GCP API Calls:

# us-central1
gcloud container node-pools describe pool \
  --cluster=reproducer-us --region=us-central1 \
  --project=YOUR_PROJECT --format=json \
  | jq '.config.ephemeralStorageLocalSsdConfig'
# Output: null

# europe-west1
gcloud container node-pools describe pool \
  --cluster=reproducer-eu --region=europe-west1 \
  --project=YOUR_PROJECT --format=json \
  | jq '.config.ephemeralStorageLocalSsdConfig'
# Output: {}

The module's dynamic block only creates ephemeral_storage_local_ssd_config when values > 0:

dynamic "ephemeral_storage_local_ssd_config" {
  for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 ||
             lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : []
  # ...
}

Result: Terraform doesn't send the block, but the europe-west1 GCP API adds it automatically with zeros, creating perpetual drift.

Terraform Configuration

module "gke" {
  source  = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  version = "40.0.0"

  project_id             = "my-project"
  name                   = "test-cluster"
  region                 = "europe-west1"
  network                = "my-vpc"
  subnetwork             = "my-subnet"
  ip_range_pods          = "gke-pods"
  ip_range_services      = "gke-services"
  enable_private_nodes   = true
  master_ipv4_cidr_block = "172.16.0.0/28"

  node_pools = [{
    name         = "pool"
    machine_type = "c3-standard-4"
    min_count    = 1
    max_count    = 1
  }]
}


**Steps to Reproduce:**

terraform init
terraform apply  # Creates successfully
terraform plan   # Shows node pool replacement in europe-west1 only

Terraform Version

- Terraform: v1.13+

Terraform Provider Versions

- Google Provider: v6.50.0
- Module Versions Tested: v38.1.0, v40.0.0

Additional information

Impact:

  • Prevents reliable Terraform management in europe-west1 with C3 machine types
  • Affects all module users using C3 machines in europe-west1 (possibly other regions/machine types)
  • Risk of accidental production node pool destruction

Environment:

  • Regions: europe-west1 (affected), us-central1 (not affected)
  • GKE: 1.33.4-gke.1245000
  • Machine Type: c3-standard-4 (affected), e2-medium (NOT affected)
  • Note: We did not perform a comprehensive test of all regions and machine types

Testing:

  • Reproduced with both module v38.1.0 and v40.0.0
  • Issue confirmed via direct gcloud API calls (no Terraform involved)
  • Discovered: 2025-10-08

Possible Workaround:
One potential module-side mitigation could be to always create the ephemeral_storage_local_ssd_config block when either parameter is explicitly set (even with zero values), which would ensure consistency with regions that return empty objects. However, this may not resolve all cases and the underlying API inconsistency would still need to be addressed by GCP.

Similar Issues:
This follows a similar pattern to issue #2100 (gcfsConfig drift), which was resolved in Google provider v6.4.0. However, this issue is unique in that it combines regional API inconsistency with the ephemeralStorageLocalSsdConfig field.

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions