Skip to content

Conversation

@bulasevich
Copy link
Contributor

@bulasevich bulasevich commented Dec 23, 2025

We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below).

This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop.

The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone synthetic benchmark to reproduce the issue.

The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters.

Current thresholds:

  • Recompilation Limit (too_many_recompiles):
    Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1
    Default: 201 (derived from default PerMethodRecompilationCutoff = 400).
  • Specific Trap Limits (too_many_traps):
    Checks if the trap count for a specific reason exceeds:
    PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc.
    PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc.

With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm.

The proposal is a minimal change in GraphKit::uncommon_trap: apply the same too_many_recompiles threshold inside Parse::path_is_suitable_for_uncommon_trap - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore.

As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are welcome

  case Deoptimization::Action_reinterpret:
    // Temporary fix for 6529811 to allow virtual calls to be sure they
    // get the chance to go from mono->bi->mega
    if (!keep_exact_action &&
        Deoptimization::trap_request_index(trap_request) < 0 &&
        too_many_recompiles(reason)) {
      // This BCI is causing too many recompilations.
      if (C->log() != nullptr) {
        C->log()->elem("observe that='trap_action_change' reason='%s' from='%s' to='none'",
                Deoptimization::trap_reason_name(reason),
                Deoptimization::trap_action_name(action));
      }
      action = Deoptimization::Action_none;
      trap_request = Deoptimization::make_trap_request(reason, action);
    } else {
      C->set_trap_can_recompile(true);
    }

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28966/head:pull/28966
$ git checkout pull/28966

Update a local copy of the PR:
$ git checkout pull/28966
$ git pull https://git.openjdk.org/jdk.git pull/28966/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28966

View PR using the GUI difftool:
$ git pr show -t 28966

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28966.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 23, 2025

👋 Welcome back bulasevich! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 23, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Dec 23, 2025

@bulasevich The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 23, 2025
@mlbridge
Copy link

mlbridge bot commented Dec 23, 2025

Webrevs

}
return seems_never_taken(prob) &&
// Skip optimization if recompile limit is exceeded to avoid deopts without recompilation.
!C->too_many_recompiles(method(), bci(), Deoptimization::Reason_unstable_if) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Compile::too_many_traps_or_recompile here.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to JDK-8243615? Could you convert your UnstableIf.java test to a jtreg test? Maybe by running in a different process and counting the number of deoptimization events? JDK-8243615 also has a test attached.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Jan 7, 2026
@openjdk
Copy link

openjdk bot commented Jan 7, 2026

@bulasevich Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 7, 2026
@bulasevich
Copy link
Contributor Author

Is this related to JDK-8243615?

Oh, yes - this is related, and we had a similar fix five years ago.. @WZhuo

Could you convert your UnstableIf.java test to a jtreg test?

Done. I converted it to a jtreg test. I’m skipping the heavyweight part (200+ lines of code) that reproduces the issue without changing the PerMethodRecompilationCutoff limit.

@iwanowww
Copy link
Contributor

iwanowww commented Jan 7, 2026

I believe all places where an uncommon trap with Action_reinterpret guarded by too_many_traps is susceptible to the very same problem.

The culprit seems to be the discrepancy between too_many_traps and too_many_recompiles where many places where uncommon traps are inserted are guarded by too_many_traps while GraphKit::uncommon_trap() checks specifically for too_many_recompiles.

As the bug demonstrates, disabling recompilation while keeping the uncommon trap in place (substituting Action_maybe_recompile/Action_maybe_recompile with Action_none) can induce a lot of overhead. So, a better strategy is to avoid an uncommon trap in the first place rather than letting it to degenerate into Action_none and, also, assert whenever the situation occurs at runtime.

Speaking of the proposed fix, my concern is that it addresses only one particular instance of the problem. Can we do better and fix similar bugs all at once? That would require aligning too_many_traps and too_many_recompiles use sites.

@iwanowww
Copy link
Contributor

iwanowww commented Jan 7, 2026

BTW JDK-6529811 did not introduce the heuristic in GraphKit::uncommon_trap(). The code predates OpenJDK. JDK-6529811 mentions an alternative way to fix the pathological behavior:

5. The Action_none bailout is dangerous. GraphKit::uncommon_trap should bail out to Action_make_not_compilable. That way the log will print an interesting failure event, and performance will degrade into the interpreter, which is faster than the deoptimizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants