[BUG] After the nodes are reset, the cluster state becomes abnormal, and shards cannot be allocated.

### Describe the bug

After the cluster nodes are reset, the cluster state becomes abnormal, and shards cannot be allocated. Shards in the UNASSIGNED/INITIALIZING state will never be recovered and their recovery process receives a ShardNotInPrimaryModeException("Shard is not in primary mode").

```
security-auditlog-2025.08.26 0 p STARTED 30 145.7kb 192.168.6.2 opensearch-node-base2
security-auditlog-2025.08.26 0 r STARTED 30 258.3kb 192.168.6.3 opensearch-node-base1
.plugins-ml-config 0 p STARTED 1 4kb 192.168.6.2 opensearch-node-base2
.plugins-ml-config 0 r STARTED 1 4kb 192.168.6.3 opensearch-node-base1
.plugins-ml-config 0 r UNASSIGNED
.opensearch-observability 0 p STARTED 0 208b 192.168.6.2 opensearch-node-base2
.opensearch-observability 0 r STARTED 0 208b 192.168.6.3 opensearch-node-base1
.opensearch-observability 0 r UNASSIGNED
.ql-datasources 0 p STARTED 0 208b 192.168.6.2 opensearch-node-base2
.ql-datasources 0 r UNASSIGNED
.ql-datasources 0 r UNASSIGNED
.opensearch-sap-log-types-config 0 p STARTED 192.168.6.2 opensearch-node-base2
.opensearch-sap-log-types-config 0 r INITIALIZING 192.168.6.3 opensearch-node-base1
.opensearch-sap-log-types-config 0 r UNASSIGNED
.kibana_92668751_admin_1 0 r STARTED 1 5.3kb 192.168.6.2 opensearch-node-base2
.kibana_92668751_admin_1 0 p STARTED 1 5.3kb 192.168.6.3 opensearch-node-base1
top_queries-2025.08.26-70653 0 p STARTED 27 59.9kb 192.168.6.2 opensearch-node-base2
top_queries-2025.08.26-70653 0 r STARTED 27 59.8kb 192.168.6.3 opensearch-node-base1
.opendistro_security 0 p STARTED 10 83.1kb 192.168.6.2 opensearch-node-base2
.opendistro_security 0 r STARTED 10 55.9kb 192.168.6.3 opensearch-node-base1
.opendistro_security 0 r UNASSIGNED
.kibana_1 0 p STARTED 0 208b 192.168.6.2 opensearch-node-base2
.kibana_1 0 r INITIALIZING 192.168.6.3 opensearch-node-base1
```

<img width="1307" height="71" alt="Image" src="https://github.com/user-attachments/assets/1efa5446-c6fc-4eb3-b5dc-1dbf44081384" />

### Related component

Cluster Manager

### To Reproduce

Step1: Create a three-node cluster (Node1, Node2, Node3).
Step2: Stop Node1 and wait for the cluster state to become green.
Step3: Stop Node2 and wait for 1 minute.
Step4: Start Node2 and wait for 1 minute.
Step5: Start Node1.

**The issue is reproduced when Node3 eventually becomes the master node after Node2 is started (Step 4).**

### Expected behavior

The cluster state returns to green.

### Additional Details

The issue is that JoinTaskExecutor uses currentState.metadata to overwrite newState.metadata, **which leads to inconsistency between the Metadata and the RoutingTable**. The detailed process analysis is as follows:

```plantuml
@startuml
participant JoinTaskExecutor
participant AllocationService
participant ReplicationTracker
database ClusterState

JoinTaskExecutor <-- ClusterState : currentState
JoinTaskExecutor --> AllocationService: disassociateDeadNodes 
AllocationService --> AllocationService : 1. update RoutingTable & IndexMetadata
note left
 Primary shard changes and PrimaryTerm += 1.
endnote
JoinTaskExecutor <-- AllocationService : newState
JoinTaskExecutor --> JoinTaskExecutor : 2. newState.metadata = updateMetadataWithRepositoriesMetadata(currentState)
note left
 PrimaryTerm is reset to currentState.
endnote
JoinTaskExecutor --> ClusterState : newState

ReplicationTracker <-- ClusterState : newState
ReplicationTracker --> ReplicationTracker : 3. newPrimaryTerm is equal to pendingPrimaryTerm
note left
 Replica shard is not activated by activatePrimaryMode since PrimaryTerm is same with local.
endnote
@enduml
```

<img width="1432" height="458" alt="Image" src="https://github.com/user-attachments/assets/63e86a45-51b8-4c38-a3f7-af4f38e4bfd0" />

Problem code:
```
 return results.build(
 allocationService.adaptAutoExpandReplicas(
 newState.nodes(nodesBuilder)
 .metadata(updateMetadataWithRepositoriesMetadata(currentState.metadata(), repositoriesMetadata)) // this line
 .build()
 )
 );
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] After the nodes are reset, the cluster state becomes abnormal, and shards cannot be allocated. #19150

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] After the nodes are reset, the cluster state becomes abnormal, and shards cannot be allocated. #19150

Description

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions