-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
TL;DR
Node pools with C3 machine types in europe-west1 show perpetual drift and forced replacements on every terraform plan, while identical configurations in us-central1 work correctly. The GCP API returns different values for ephemeralStorageLocalSsdConfig based on region + machine type combination, and the module does not handle this inconsistency gracefully. Reproduced with modules/private-cluster, but other cluster modules may be affected as well.
Expected behavior
After terraform apply successfully creates a GKE cluster with node pools, running terraform plan should show no changes.
Observed behavior
- ✅ us-central1:
terraform planshows no changes - ❌ europe-west1:
terraform planforces node pool replacement
# module.gke.google_container_node_pool.pools["pool"] must be replaced
~ node_config {
- ephemeral_storage_local_ssd_config { # forces replacement
- data_cache_count = 0 -> null
- local_ssd_count = 0 -> null
}
}
Root Cause Verified via Direct GCP API Calls:
# us-central1
gcloud container node-pools describe pool \
--cluster=reproducer-us --region=us-central1 \
--project=YOUR_PROJECT --format=json \
| jq '.config.ephemeralStorageLocalSsdConfig'
# Output: null
# europe-west1
gcloud container node-pools describe pool \
--cluster=reproducer-eu --region=europe-west1 \
--project=YOUR_PROJECT --format=json \
| jq '.config.ephemeralStorageLocalSsdConfig'
# Output: {}The module's dynamic block only creates ephemeral_storage_local_ssd_config when values > 0:
dynamic "ephemeral_storage_local_ssd_config" {
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 ||
lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : []
# ...
}Result: Terraform doesn't send the block, but the europe-west1 GCP API adds it automatically with zeros, creating perpetual drift.
Terraform Configuration
module "gke" {
source = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
version = "40.0.0"
project_id = "my-project"
name = "test-cluster"
region = "europe-west1"
network = "my-vpc"
subnetwork = "my-subnet"
ip_range_pods = "gke-pods"
ip_range_services = "gke-services"
enable_private_nodes = true
master_ipv4_cidr_block = "172.16.0.0/28"
node_pools = [{
name = "pool"
machine_type = "c3-standard-4"
min_count = 1
max_count = 1
}]
}
**Steps to Reproduce:**
terraform init
terraform apply # Creates successfully
terraform plan # Shows node pool replacement in europe-west1 onlyTerraform Version
- Terraform: v1.13+Terraform Provider Versions
- Google Provider: v6.50.0
- Module Versions Tested: v38.1.0, v40.0.0Additional information
Impact:
- Prevents reliable Terraform management in europe-west1 with C3 machine types
- Affects all module users using C3 machines in europe-west1 (possibly other regions/machine types)
- Risk of accidental production node pool destruction
Environment:
- Regions: europe-west1 (affected), us-central1 (not affected)
- GKE: 1.33.4-gke.1245000
- Machine Type: c3-standard-4 (affected), e2-medium (NOT affected)
- Note: We did not perform a comprehensive test of all regions and machine types
Testing:
- Reproduced with both module v38.1.0 and v40.0.0
- Issue confirmed via direct
gcloudAPI calls (no Terraform involved) - Discovered: 2025-10-08
Possible Workaround:
One potential module-side mitigation could be to always create the ephemeral_storage_local_ssd_config block when either parameter is explicitly set (even with zero values), which would ensure consistency with regions that return empty objects. However, this may not resolve all cases and the underlying API inconsistency would still need to be addressed by GCP.
Similar Issues:
This follows a similar pattern to issue #2100 (gcfsConfig drift), which was resolved in Google provider v6.4.0. However, this issue is unique in that it combines regional API inconsistency with the ephemeralStorageLocalSsdConfig field.
References: