Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix simultaneously creating a snapshot and updating the repository can potentially trigger an infinite loop #17532

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kkewwei
Copy link
Contributor

@kkewwei kkewwei commented Mar 6, 2025

Description

Related Issues

Resolves #17531

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Mar 6, 2025

❌ Gradle check result for 2a17f65: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch 2 times, most recently from 0a9da2d to 42e3000 Compare March 6, 2025 16:08
@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch 3 times, most recently from f25fde9 to 5b2dd4a Compare March 7, 2025 12:55
Copy link
Contributor

github-actions bot commented Mar 7, 2025

❌ Gradle check result for 5b2dd4a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch from 5b2dd4a to d766eb3 Compare March 7, 2025 13:55
Copy link
Contributor

github-actions bot commented Mar 7, 2025

❌ Gradle check result for d766eb3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch from d766eb3 to 2757776 Compare March 8, 2025 05:08
Copy link
Contributor

github-actions bot commented Mar 8, 2025

❌ Gradle check result for 2757776: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch from 2757776 to f234048 Compare March 8, 2025 07:23
Copy link
Contributor

github-actions bot commented Mar 8, 2025

❌ Gradle check result for f234048: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch from f234048 to 7957cdb Compare March 10, 2025 04:07
@kkewwei kkewwei marked this pull request as ready for review March 10, 2025 04:08
Copy link
Contributor

❕ Gradle check result for 7957cdb: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Mar 10, 2025

Codecov Report

Attention: Patch coverage is 16.66667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 72.48%. Comparing base (dcad6b8) to head (27de151).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...ch/repositories/blobstore/BlobStoreRepository.java 20.00% 2 Missing and 2 partials ⚠️
.../org/opensearch/repositories/FilterRepository.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17532      +/-   ##
============================================
+ Coverage     72.46%   72.48%   +0.01%     
+ Complexity    65757    65746      -11     
============================================
  Files          5311     5311              
  Lines        305001   305017      +16     
  Branches      44230    44232       +2     
============================================
+ Hits         221022   221094      +72     
+ Misses        65932    65777     -155     
- Partials      18047    18146      +99     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kkewwei
Copy link
Contributor Author

kkewwei commented Mar 10, 2025

UNSTABLE

❕ Gradle check result for 7957cdb: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation ##14293

@Override
public void executeConsistentStateUpdate(
Function<RepositoryData, ClusterStateUpdateTask> createUpdateTask,
String source,
Supplier<Repository> currentRepositeSupplier,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to change the API, I'm wondering if we can retry in SnapshotsService, that is, re-call the createSnapshot method when startRepository != currentRepository

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there is no elegant way to exit unless the executeConsistentStateUpdate is modified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the onFailure to do retry in SnapshotService. When this repo is closed, not to re-call the executeConsistentStateUpdate and invoke the onFailure.

Copy link
Contributor

❌ Gradle check result for b054499: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei
Copy link
Contributor Author

kkewwei commented Mar 14, 2025

❌ Gradle check result for b054499: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

S3RemoteStoreIT.testPeerRecoveryWithLowActivityTimeout #16145

@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch from b054499 to b3e8d32 Compare March 19, 2025 06:47
Copy link
Contributor

❌ Gradle check result for b3e8d32: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch from b3e8d32 to d0ae308 Compare March 19, 2025 10:27
Copy link
Contributor

❌ Gradle check result for d0ae308: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch from d0ae308 to 27de151 Compare March 19, 2025 11:05
Copy link
Contributor

❕ Gradle check result for 27de151: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@kkewwei kkewwei force-pushed the fix_loop_in_updateRep branch from 27de151 to 53a2e8c Compare March 20, 2025 04:31
Copy link
Contributor

❌ Gradle check result for 53a2e8c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Snapshots
Projects
Status: No status
2 participants