Skip to content

8385723: Intermittent failure of serviceability/sa/ClhsdbInspect.java#31355

Open
plummercj wants to merge 1 commit into
openjdk:masterfrom
plummercj:8385723_lingeredapp_gc
Open

8385723: Intermittent failure of serviceability/sa/ClhsdbInspect.java#31355
plummercj wants to merge 1 commit into
openjdk:masterfrom
plummercj:8385723_lingeredapp_gc

Conversation

@plummercj
Copy link
Copy Markdown
Contributor

@plummercj plummercj commented Jun 2, 2026

Most SA tests rely on the debuggee basically being idle and not doing things like triggering a GC. Unfortunately this cannot be 100% guaranteed. One issue is with LingeredApp.main(), which wakes up from sleep every second to touch the lock file. This could generate a small amount of garbage which could trigger a GC. This normally never happens, but we were seeing a case where it did if CDS was disabled. Forcing a GC before the loop seems to fix the issue.

Tested with svc CI testing in all tiers. Also ran SA tests a large number of times with and without the fix, and it seems to be reliable.



Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8385723: Intermittent failure of serviceability/sa/ClhsdbInspect.java (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31355/head:pull/31355
$ git checkout pull/31355

Update a local copy of the PR:
$ git checkout pull/31355
$ git pull https://git.openjdk.org/jdk.git pull/31355/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 31355

View PR using the GUI difftool:
$ git pr show -t 31355

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31355.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link
Copy Markdown

bridgekeeper Bot commented Jun 2, 2026

👋 Welcome back cjplummer! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Jun 2, 2026

@plummercj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8385723: Intermittent failure of serviceability/sa/ClhsdbInspect.java

Reviewed-by: ayang, lmesnik, syan, kevinw

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 20 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk Bot changed the title 8385723 8385723: Intermittent failure of serviceability/sa/ClhsdbInspect.java Jun 2, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Jun 2, 2026

@plummercj To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request. In order to have an "RFR" email sent to the correct mailing list, you will need to add one or more applicable labels manually using the /label pull request command.

Applicable Labels
  • build
  • client
  • compiler
  • core-libs
  • hotspot
  • hotspot-compiler
  • hotspot-gc
  • hotspot-jfr
  • hotspot-runtime
  • i18n
  • ide-support
  • javadoc
  • jdk
  • net
  • nio
  • security
  • serviceability
  • shenandoah

@plummercj
Copy link
Copy Markdown
Contributor Author

/label serviceability

@openjdk openjdk Bot added the serviceability serviceability-dev@openjdk.org label Jun 2, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Jun 2, 2026

@plummercj
The serviceability label was successfully added.

@openjdk openjdk Bot added the rfr Pull request is ready for review label Jun 2, 2026
@mlbridge
Copy link
Copy Markdown

mlbridge Bot commented Jun 2, 2026

Webrevs

@openjdk openjdk Bot added the ready Pull request is ready to be integrated label Jun 2, 2026
}
}
// Force a GC now to reduce the risk of one happening during the loop below.
System.gc();
Copy link
Copy Markdown
Member

@lmesnik lmesnik Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct.
There are few comments to consider:

  1. Wouldn't it makes a to try to test if issue is reproducible with other GC?
  2. Does it makes to don't it makes sense to start app with '-XX:-DisableExplicitGC'?
  3. Run GC several times to try to push object into old generation.
  4. Re-write loop to create less number of object to minimize GC triggering.

Like

File theLockFile = <file from theLockFileName>;
System.gc();
while (Files.exists(path)) {
          // Touch the lock to indicate our readiness
          theLockFiles.etLastModified(theLockFile, epoch());
          isReady = true;
          Thread.sleep(spinDelay);
      }
}

These are just fix to consider, I am ok to first try to push your fix and improve it only we still these failures.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I did run CI testing that I believe includes using different GCs.
  2. Are you suggesting that we don't run the test if -XX:-DisableExplicitGC is used? I don't think it is a goal to have tests defend against all possible flags that might disrupted it.
  3. I actually started with 3 GCs. That worked, so I tried just one and it worked just as well. I'm not a GC expert, so I'm not really sure of the full impact these forced GCs have on the java heap. Maybe @albertnetymk can answer.
  4. I think the real issue is that without a forced GC, we might end up on the edge of doing a GC when entering the loop, so could easily trigger one with a few small allocations. With a GC done first, I don't see this loop ever producing enough garbage to trigger another GC. The test will time out first.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, one System.gc() should be enough to dodge the disturbing GC btw two consecutive debugger commands. (Using WB_FullGC would guard ourselves from XX:-DisableExplicitGC, but this workaround is good enough, IMO.)

Copy link
Copy Markdown
Member

@lmesnik lmesnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all explanations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready Pull request is ready to be integrated rfr Pull request is ready for review serviceability serviceability-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

5 participants