[AUTOCUT] Gradle Check Flaky Test Report for SearchReplicaRestoreIT #17648

opensearch-ci-bot · 2025-03-21T00:43:11Z

Flaky Test Report for `SearchReplicaRestoreIT`

Noticed the SearchReplicaRestoreIT has some flaky, failing tests that failed during post-merge actions.

Details

Git Reference	Merged Pull Request	Build Details	Test Name
`9d4414b`	17604	54813	`org.opensearch.indices.replication.SearchReplicaRestoreIT.testSearchReplicaRestore_WhenSnapshotOnSegRepWithSearchReplica_RestoreOnDocRep`

The other pull requests, besides those involved in post-merge actions, that contain failing tests with the SearchReplicaRestoreIT class are:

17457
16111

For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard.

The text was updated successfully, but these errors were encountered:

andrross · 2025-03-21T15:34:00Z

@mch2 @vinaykpud This looks like a fairly new test that is now flaky. Can we fix/remove/mute this so as to not add more flakiness?

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.indices.replication.SearchReplicaRestoreIT.testSearchReplicaRestore_WhenSnapshotOnSegRepWithSearchReplica_RestoreOnDocRep" -Dtests.seed=BC2DF12FF548A44D -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ann-Latn-NG -Dtests.timezone=US/Eastern -Druntime.java=21

SearchReplicaRestoreIT > testSearchReplicaRestore_WhenSnapshotOnSegRepWithSearchReplica_RestoreOnDocRep FAILED
    java.lang.AssertionError: Remote metadata file can't be null if shard is active STARTED
        at __randomizedtesting.SeedInfo.seed([BC2DF12FF548A44D]:0)
        at org.opensearch.indices.replication.RemoteStoreReplicationSource.getCheckpointMetadata(RemoteStoreReplicationSource.java:76)
        at org.opensearch.indices.replication.SegmentReplicationTarget.startReplication(SegmentReplicationTarget.java:179)
        at org.opensearch.indices.replication.SegmentReplicator.start(SegmentReplicator.java:275)
        at org.opensearch.indices.replication.SegmentReplicator$ReplicationRunner.doRun(SegmentReplicator.java:261)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:994)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
        at java.****/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.****/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.****/java.lang.Thread.run(Thread.java:1583)

vinaykpud · 2025-03-21T15:40:30Z

@andrross Yes, I will take a look into this

vinaykpud · 2025-03-21T16:02:55Z

@andrross, @mch2
It looks like flakiness was not introduced by this test, its because of the existing code here :

OpenSearch/server/src/main/java/org/opensearch/indices/replication/RemoteStoreReplicationSource.java

Lines 70 to 77 in ee7fbbd

    
           // During initial recovery flow, the remote store might not 
        
           // have metadata as primary hasn't uploaded anything yet. 
        
           if (mdFile == null && indexShard.state().equals(IndexShardState.STARTED) == false) { 
        
               listener.onResponse(new CheckpointInfoResponse(checkpoint, Collections.emptyMap(), null)); 
        
               return; 
        
           } 
        
           assert mdFile != null : "Remote metadata file can't be null if shard is active " + indexShard.state(); 
        
           metadataMap = mdFile.getMetadata()

In the above block, flakiness happens when is the indexShard.state() is not equal to IndexShardState.STARTED in line 72, but its STARTED in line 76 throws exception.

We need to check if we can handle this gracefully to avoid flakiness.

opensearch-ci-bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run untriaged labels Mar 21, 2025

mch2 assigned vinaykpud Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AUTOCUT] Gradle Check Flaky Test Report for SearchReplicaRestoreIT #17648

[AUTOCUT] Gradle Check Flaky Test Report for SearchReplicaRestoreIT #17648

opensearch-ci-bot commented Mar 21, 2025

andrross commented Mar 21, 2025

vinaykpud commented Mar 21, 2025 •

edited

Loading

vinaykpud commented Mar 21, 2025 •

edited

Loading

[AUTOCUT] Gradle Check Flaky Test Report for SearchReplicaRestoreIT #17648

[AUTOCUT] Gradle Check Flaky Test Report for SearchReplicaRestoreIT #17648

Comments

opensearch-ci-bot commented Mar 21, 2025

Flaky Test Report for SearchReplicaRestoreIT

Details

andrross commented Mar 21, 2025

vinaykpud commented Mar 21, 2025 • edited Loading

vinaykpud commented Mar 21, 2025 • edited Loading

Flaky Test Report for `SearchReplicaRestoreIT`

vinaykpud commented Mar 21, 2025 •

edited

Loading

vinaykpud commented Mar 21, 2025 •

edited

Loading