Skip to content

Add tolerance for matrix stats agg variance value assertion #18367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 30, 2025

Conversation

kkewwei
Copy link
Collaborator

@kkewwei kkewwei commented May 26, 2025

Description

In #18351, this flaky test failed again. Subsequently, after I ran 100 iterations, and the same problem occurred elsewhere

Related Issues

Resolves #18129

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kkewwei kkewwei requested a review from a team as a code owner May 26, 2025 07:36
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels May 26, 2025
Copy link
Contributor

❌ Gradle check result for fab600c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei closed this May 26, 2025
@kkewwei kkewwei reopened this May 26, 2025
Copy link
Contributor

❌ Gradle check result for fab600c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>
Copy link
Contributor

✅ Gradle check result for db0b09f: SUCCESS

Copy link

codecov bot commented May 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.67%. Comparing base (fe4a98d) to head (f22cd60).
Report is 3 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18367      +/-   ##
============================================
+ Coverage     72.60%   72.67%   +0.06%     
- Complexity    67682    67683       +1     
============================================
  Files          5497     5497              
  Lines        311819   311817       -2     
  Branches      45265    45265              
============================================
+ Hits         226409   226601     +192     
+ Misses        66941    66724     -217     
- Partials      18469    18492      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross
Copy link
Member

@bowenlan-amzn Can you take a look? I actually could not reproduce these failures with the seeds listed above, but also adding a variance makes sense to me as I suspect some things could be platform and/or JDK specific and we're likely to continue running in to problems in the future doing exact comparisons with floating point values.

Copy link
Member

@bowenlan-amzn bowenlan-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkewwei Thanks a lot for helping on this!

I suggest to use 1.0e-12 as the delta in all cases. So can refactor all the assertions here into one place with a little explanation about why delta is needed. Sth like:

// Concurrent search could have small difference in floating-point results when merging from different way of slicing shard.

cc: @andrross

I think this problem happens non-deterministiclly because our merge algorithm is random and not controlled by seed.
Here we index 5 documents, but how many segments in the end before searching is random depending on the merge.
This is based on my past experience and what I explain to myself but not sure if it's the true cause

@kkewwei
Copy link
Collaborator Author

kkewwei commented May 29, 2025

@bowenlan-amzn Platform & JDK: OpenJDK-23.0.2, macOS M1 Pro.

1.0e-12 seems to be a suitable value, so far, none of the errors have exceeded this value. I will modify it like this.

Signed-off-by: kkewwei <[email protected]>
Signed-off-by: kkewwei <[email protected]>
Copy link
Contributor

❌ Gradle check result for a9b9253: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei closed this May 29, 2025
@kkewwei kkewwei reopened this May 29, 2025
Copy link
Contributor

❌ Gradle check result for a9b9253: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kkewwei kkewwei closed this May 29, 2025
@kkewwei kkewwei reopened this May 29, 2025
Copy link
Contributor

❌ Gradle check result for a9b9253: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for f22cd60: SUCCESS

@kkewwei
Copy link
Collaborator Author

kkewwei commented May 30, 2025

@andrross @bowenlan-amzn Can we merge it in?

@andrross andrross merged commit 6320201 into opensearch-project:main May 30, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autocut flaky-test Random test failure that succeeds on second run >test-failure Test failure from CI, local build, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for SearchIT
3 participants