Skip to content

Add interceptCheckpointUpdates to TieringStatusIT and TierCancelIT#21757

Open
GeekGlider wants to merge 1 commit into
opensearch-project:mainfrom
GeekGlider:feature/tiered-storage-skeleton
Open

Add interceptCheckpointUpdates to TieringStatusIT and TierCancelIT#21757
GeekGlider wants to merge 1 commit into
opensearch-project:mainfrom
GeekGlider:feature/tiered-storage-skeleton

Conversation

@GeekGlider
Copy link
Copy Markdown
Contributor

Description

Adds interceptCheckpointUpdates fix to TieringStatusIT and TierCancelIT to resolve theReplicationTracker assertion race condition (GitHub #3923).

During tiering operations with segment replication, a checkpoint update can arrive after relocation handoff begins, triggering an assertion in ReplicationTracker.updateVisibleCheckpointForShard. The fix intercepts the UPDATE_VISIBLE_CHECKPOINT transport action and gracefully handles theAssertionError — the same approach already applied to HotToWarmTieringServiceIT andWarmToHotTieringServiceIT.

This PR is intended to be merged into #21749.

Related Issues

Resolves the flaky TieringStatusIT.testTieringStatus failure (muted via @AwaitsFix in #21749 commit 7).

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Kavya Aggarwal <kavyaagg@amazon.com>
@GeekGlider GeekGlider requested a review from a team as a code owner May 20, 2026 11:07
@github-actions
Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Swallowed AssertionError

The interceptor catches AssertionError and sends an empty response without logging or re-throwing. If an assertion fails for reasons other than the known race condition (e.g., a genuine bug in ReplicationTracker), this silently masks the failure. The test will pass even though the underlying assertion indicates a problem. Consider logging the caught error or checking its message to ensure only the expected race condition is suppressed.

} catch (AssertionError e) {
    channel.sendResponse(TransportResponse.Empty.INSTANCE);
}
Swallowed AssertionError

The interceptor catches AssertionError and sends an empty response without logging or re-throwing. If an assertion fails for reasons other than the known race condition (e.g., a genuine bug in ReplicationTracker), this silently masks the failure. The test will pass even though the underlying assertion indicates a problem. Consider logging the caught error or checking its message to ensure only the expected race condition is suppressed.

} catch (AssertionError e) {
    channel.sendResponse(TransportResponse.Empty.INSTANCE);
}

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Log suppressed assertion errors

Silently catching and suppressing AssertionError can hide critical test failures and
make debugging difficult. Consider logging the error or re-throwing it after sending
the response to ensure test failures are visible.

server/src/internalClusterTest/java/org/opensearch/storage/tiering/TierCancelIT.java [331-340]

 mockTransportService.addRequestHandlingBehavior(
     SegmentReplicationSourceService.Actions.UPDATE_VISIBLE_CHECKPOINT,
     (handler, request, channel, task) -> {
         try {
             handler.messageReceived(request, channel, task);
         } catch (AssertionError e) {
+            logger.warn("AssertionError caught during checkpoint update", e);
             channel.sendResponse(TransportResponse.Empty.INSTANCE);
         }
     }
 );
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that silently catching AssertionError can hide test failures. Adding logging would improve debuggability. However, the suggestion doesn't fundamentally change the behavior and the current implementation may be intentional for handling expected assertion errors during tiering operations.

Medium

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for acea19b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant