8385723: Intermittent failure of serviceability/sa/ClhsdbInspect.java#31355
8385723: Intermittent failure of serviceability/sa/ClhsdbInspect.java#31355plummercj wants to merge 1 commit into
Conversation
|
👋 Welcome back cjplummer! A progress list of the required criteria for merging this PR into |
|
@plummercj This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 20 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@plummercj To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request. In order to have an "RFR" email sent to the correct mailing list, you will need to add one or more applicable labels manually using the /label pull request command. Applicable Labels
|
|
/label serviceability |
|
@plummercj |
| } | ||
| } | ||
| // Force a GC now to reduce the risk of one happening during the loop below. | ||
| System.gc(); |
There was a problem hiding this comment.
The fix looks correct.
There are few comments to consider:
- Wouldn't it makes a to try to test if issue is reproducible with other GC?
- Does it makes to don't it makes sense to start app with '-XX:-DisableExplicitGC'?
- Run GC several times to try to push object into old generation.
- Re-write loop to create less number of object to minimize GC triggering.
Like
File theLockFile = <file from theLockFileName>;
System.gc();
while (Files.exists(path)) {
// Touch the lock to indicate our readiness
theLockFiles.etLastModified(theLockFile, epoch());
isReady = true;
Thread.sleep(spinDelay);
}
}
These are just fix to consider, I am ok to first try to push your fix and improve it only we still these failures.
There was a problem hiding this comment.
- I did run CI testing that I believe includes using different GCs.
- Are you suggesting that we don't run the test if -XX:-DisableExplicitGC is used? I don't think it is a goal to have tests defend against all possible flags that might disrupted it.
- I actually started with 3 GCs. That worked, so I tried just one and it worked just as well. I'm not a GC expert, so I'm not really sure of the full impact these forced GCs have on the java heap. Maybe @albertnetymk can answer.
- I think the real issue is that without a forced GC, we might end up on the edge of doing a GC when entering the loop, so could easily trigger one with a few small allocations. With a GC done first, I don't see this loop ever producing enough garbage to trigger another GC. The test will time out first.
There was a problem hiding this comment.
In this context, one System.gc() should be enough to dodge the disturbing GC btw two consecutive debugger commands. (Using WB_FullGC would guard ourselves from XX:-DisableExplicitGC, but this workaround is good enough, IMO.)
lmesnik
left a comment
There was a problem hiding this comment.
Thank you for all explanations!
Most SA tests rely on the debuggee basically being idle and not doing things like triggering a GC. Unfortunately this cannot be 100% guaranteed. One issue is with LingeredApp.main(), which wakes up from sleep every second to touch the lock file. This could generate a small amount of garbage which could trigger a GC. This normally never happens, but we were seeing a case where it did if CDS was disabled. Forcing a GC before the loop seems to fix the issue.
Tested with svc CI testing in all tiers. Also ran SA tests a large number of times with and without the fix, and it seems to be reliable.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31355/head:pull/31355$ git checkout pull/31355Update a local copy of the PR:
$ git checkout pull/31355$ git pull https://git.openjdk.org/jdk.git pull/31355/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 31355View PR using the GUI difftool:
$ git pr show -t 31355Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31355.diff
Using Webrev
Link to Webrev Comment