Node pool forced replacement due to GCP API regional inconsistency with ephemeralStorageLocalSsdConfig

### TL;DR

Node pools with C3 machine types in `europe-west1` show perpetual drift and forced replacements on every `terraform plan`, while identical configurations in `us-central1` work correctly. The GCP API returns different values for `ephemeralStorageLocalSsdConfig` based on region + machine type combination, and the module does not handle this inconsistency gracefully. Reproduced with `modules/private-cluster`, but other cluster modules may be affected as well.

### Expected behavior

After `terraform apply` successfully creates a GKE cluster with node pools, running `terraform plan` should show no changes.


### Observed behavior

- ✅ **us-central1**: `terraform plan` shows no changes
- ❌ **europe-west1**: `terraform plan` forces node pool replacement

```
# module.gke.google_container_node_pool.pools["pool"] must be replaced
  ~ node_config {
      - ephemeral_storage_local_ssd_config {  # forces replacement
          - data_cache_count = 0 -> null
          - local_ssd_count  = 0 -> null
        }
    }
```

**Root Cause Verified via Direct GCP API Calls:**

```bash
# us-central1
gcloud container node-pools describe pool \
  --cluster=reproducer-us --region=us-central1 \
  --project=YOUR_PROJECT --format=json \
  | jq '.config.ephemeralStorageLocalSsdConfig'
# Output: null

# europe-west1
gcloud container node-pools describe pool \
  --cluster=reproducer-eu --region=europe-west1 \
  --project=YOUR_PROJECT --format=json \
  | jq '.config.ephemeralStorageLocalSsdConfig'
# Output: {}
```

The module's dynamic block only creates `ephemeral_storage_local_ssd_config` when values > 0:

```hcl
dynamic "ephemeral_storage_local_ssd_config" {
  for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 ||
             lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : []
  # ...
}
```

**Result:** Terraform doesn't send the block, but the europe-west1 GCP API adds it automatically with zeros, creating perpetual drift.

### Terraform Configuration

```hcl
module "gke" {
  source  = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  version = "40.0.0"

  project_id             = "my-project"
  name                   = "test-cluster"
  region                 = "europe-west1"
  network                = "my-vpc"
  subnetwork             = "my-subnet"
  ip_range_pods          = "gke-pods"
  ip_range_services      = "gke-services"
  enable_private_nodes   = true
  master_ipv4_cidr_block = "172.16.0.0/28"

  node_pools = [{
    name         = "pool"
    machine_type = "c3-standard-4"
    min_count    = 1
    max_count    = 1
  }]
}


**Steps to Reproduce:**

terraform init
terraform apply  # Creates successfully
terraform plan   # Shows node pool replacement in europe-west1 only
```

### Terraform Version

```sh
- Terraform: v1.13+
```

### Terraform Provider Versions

```sh
- Google Provider: v6.50.0
- Module Versions Tested: v38.1.0, v40.0.0
```

### Additional information

**Impact:**
- Prevents reliable Terraform management in europe-west1 with C3 machine types
- Affects all module users using C3 machines in europe-west1 (possibly other regions/machine types)
- Risk of accidental production node pool destruction

**Environment:**
- **Regions**: europe-west1 (affected), us-central1 (not affected)
- **GKE**: 1.33.4-gke.1245000
- **Machine Type**: c3-standard-4 (affected), e2-medium (NOT affected)
- **Note**: We did not perform a comprehensive test of all regions and machine types

**Testing:**
- Reproduced with both module v38.1.0 and v40.0.0
- Issue confirmed via direct `gcloud` API calls (no Terraform involved)
- Discovered: 2025-10-08

**Possible Workaround:**
One potential module-side mitigation could be to always create the `ephemeral_storage_local_ssd_config` block when either parameter is explicitly set (even with zero values), which would ensure consistency with regions that return empty objects. However, this may not resolve all cases and the underlying API inconsistency would still need to be addressed by GCP.

**Similar Issues:**
This follows a similar pattern to issue #2100 (gcfsConfig drift), which was resolved in Google provider v6.4.0. However, this issue is unique in that it combines regional API inconsistency with the ephemeralStorageLocalSsdConfig field.

**References:**
- GCP API: https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1/NodeConfig


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Node pool forced replacement due to GCP API regional inconsistency with ephemeralStorageLocalSsdConfig #2465

TL;DR

Expected behavior

Observed behavior

Terraform Configuration

Terraform Version

Terraform Provider Versions

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Node pool forced replacement due to GCP API regional inconsistency with ephemeralStorageLocalSsdConfig #2465

Description

TL;DR

Expected behavior

Observed behavior

Terraform Configuration

Terraform Version

Terraform Provider Versions

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions