Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenJ9 Crashes on ECJ-Compiled Class File #21417

Open
yingquanzhao opened this issue Mar 19, 2025 · 13 comments
Open

OpenJ9 Crashes on ECJ-Compiled Class File #21417

yingquanzhao opened this issue Mar 19, 2025 · 13 comments

Comments

@yingquanzhao
Copy link

Java -version output

JDK 9:
openjdk version "1.8.0_442"
IBM Semeru Runtime Open Edition (build 1.8.0_442-b06)
Eclipse OpenJ9 VM (build openj9-0.49.0, JRE 1.8.0 Mac OS X amd64-64-Bit Compressed References 20250205_1183 (JIT enabled, AOT enabled)
OpenJ9   - 3c3d179854
OMR      - e49875871
JCL      - 61f83383b8 based on jdk8u442-b06)

JDK 11:
openjdk version "11.0.26" 2025-01-21
IBM Semeru Runtime Open Edition 11.0.26.0 (build 11.0.26+4)
Eclipse OpenJ9 VM 11.0.26.0 (build openj9-0.49.0, JRE 11 Mac OS X aarch64-64-Bit 20250205_839 (JIT enabled, AOT enabled)
OpenJ9   - 3c3d179854
OMR      - e49875871
JCL      - 674ad23a80 based on jdk-11.0.26+4)

JDK 17:
openjdk version "17.0.14" 2025-01-21
IBM Semeru Runtime Open Edition 17.0.14.0 (build 17.0.14+7)
Eclipse OpenJ9 VM 17.0.14.0 (build openj9-0.49.0, JRE 17 Mac OS X aarch64-64-Bit 20250121_796 (JIT enabled, AOT enabled)
OpenJ9   - 3c3d179854
OMR      - e49875871
JCL      - cbbc8b94a62 based on jdk-17.0.14+7)

JDK 21:
openjdk version "21.0.6" 2025-01-21 LTS
IBM Semeru Runtime Open Edition 21.0.6.0 (build 21.0.6+7-LTS)
Eclipse OpenJ9 VM 21.0.6.0 (build openj9-0.49.0, JRE 21 Mac OS X aarch64-64-Bit 20250121_371 (JIT enabled, AOT enabled)
OpenJ9   - 3c3d179854
OMR      - e49875871
JCL      - e01368f00df based on jdk-21.0.6+7)

Output from java -version.

Summary of problem

Hi, we have identified a test program that causes OpenJ9 to crash.

Below is the source code of the test program:

public class Test {
    public static float func() {
        int a = 223, arr[]=new int[400];
        for (int i = 1; i < 100; i++) {
            for (int i1 = 1; i1 < 10; i1++){
                try {
                    a = (1 % a);
                    arr[i1 + 1] = (i1 / -871268305);
                } catch (ArithmeticException e) {}
            }
        }
        return Double.doubleToLongBits(1);
    }

    public static void main(String[] strArr) {
        for (int i = 1; i < 100; ++i) {
            func();
        }
    }
}

When this program is compiled by javac, OpenJ9 runs the generated class file without any issues. However, when compiled using ecj, running the generated class file on OpenJ9 triggers a crash, while HotSpot runs it successfully without any problem. This issue is reproducible on OpenJDK 11, 17, and 21 but does not occur on OpenJDK 8, where the program executes normally.

Steps to Reproduce

The following commands were used to compile and execute the test program:

pathTo/jdk-21.0.6+7/bin/java -cp . -jar ./ecj-4.35.jar --release 21 -d . Test.java
pathToOpenJ9/jdk-21.0.6+7/bin/java -cp . Test

pathTo/jdk-21.0.6+7/bin/java -cp . -jar ./ecj-4.35.jar --release 17 -d . Test.java
pathToOpenJ9/jdk-17.0.14+7-jre/bin/java -cp . Test

pathTo/jdk-21.0.6+7/bin/java -cp . -jar ./ecj-4.35.jar --release 11 -d . Test.java
pathToOpenJ9/jdk-11.0.26+4-jre/bin/java -cp . Test

All ECJ-compiled test programs crash when executed using OpenJ9. We have uploaded the test program along with the necessary dependencies, which can be downloaded from CrashWithEcj.zip.

Would appreciate your insights into this issue. Thanks!

Note that, this issue is only reproducible on Ubuntu. On macOS, OpenJ9 does not crash.

Software and Hardware Environment:

OS:
    lsb_release -a output:		
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description:    Ubuntu 20.04.6 LTS
    Release:        20.04
    Codename:       focal

Diagnostic files

The core dump file can be found in the attachment.

@hangshao0
Copy link
Contributor

It is an assertion failure in JIT.

Unhandled exception
Type=Unhandled trap vmState=0x000544ff
J9Generic_Signal_Number=00000108 Signal_Number=00000005 Error_Value=00000000 Signal_Code=fffffffa
Handler1=00007F7FADE7BC50 Handler2=00007F7FADCE47A0
RDI=000000000001B35E RSI=000000000001B361 RAX=0000000000000000 RBX=00007F7F94BAF640
RCX=00007F7FAE2119FC RDX=0000000000000005 R8=0000000000000038 R9=0000000000000000
R10=0000000000000001 R11=0000000000000246 R12=0000000000000005 R13=0000000000000016
R14=00007F7F8D8C1120 R15=0000000000000000
RIP=00007F7FAE2119FC GS=0000 FS=0000 RSP=00007F7F94BA4050
EFlags=0000000000000246 CS=0033 RBP=000000000001B361 ERR=0000000000000000
TRAPNO=0000000000000000 OLDMASK=0000000000000000 CR2=0000000000000000
xmm0=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm1=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3=373d637620313d63 (f: 540097920.000000, d: 1.317827e-42)
xmm4=39323d6e76203432 (f: 1981821952.000000, d: 3.512890e-33)
xmm5=205d306463326337 (f: 1664246528.000000, d: 8.708070e-153)
xmm6=2020202020202020 (f: 538976256.000000, d: 6.013470e-154)
xmm7=2020202020202020 (f: 538976256.000000, d: 6.013470e-154)
xmm8=63206c6c75662072 (f: 1969627264.000000, d: 3.099089e+169)
xmm9=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10=6969161669696952 (f: 1768515968.000000, d: 6.000721e+199)
xmm11=0000015200000151 (f: 337.000000, d: 7.172346e-312)
xmm12=0000013d00000140 (f: 320.000000, d: 6.726727e-312)
xmm13=000001380000013f (f: 319.000000, d: 6.620627e-312)
xmm14=0000000008001800 (f: 134223872.000000, d: 6.631540e-316)
xmm15=000001420000013b (f: 315.000000, d: 6.832826e-312)
Module=/lib/x86_64-linux-gnu/libc.so.6
Module_base_address=00007F7FAE17B000 Symbol=pthread_kill
Symbol_address=00007F7FAE2118D0

Method_being_compiled=Test.func()F
Target=2_90_20250310_000000 (Linux 5.15.0-134-generic)
CPU=amd64 (16 logical CPUs) (0x7d5a67000 RAM)
----------- Stack Backtrace -----------
pthread_kill+0x12c (0x00007F7FAE2119FC [libc.so.6+0x969fc])
raise+0x16 (0x00007F7FAE1BD476 [libc.so.6+0x42476])
_ZN2TR4trapEv+0x4d (0x00007F7FAD0E90ED [libj9jit29.so+0x60e0ed])
_ZN2TR27fatal_assertion_with_detailERKNS_16AssertionContextEPKciS4_S4_z+0xa9 (0x00007F7FAD0E9229 [libj9jit29.so+0x60e229])
_ZN12TR_CISCGraph14importUDchainsEPN2TR11CompilationEP13TR_UseDefInfob+0xb2b (0x00007F7FACE2E4BB [libj9jit29.so+0x3534bb])
_ZN18TR_CISCTransformer27computeTopologicalEmbeddingEP12TR_CISCGraphS1_+0x51b (0x00007F7FACE2E9FB [libj9jit29.so+0x3539fb])
_ZN18TR_CISCTransformer7performEv+0xe27 (0x00007F7FACE30987 [libj9jit29.so+0x355987])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0x865 (0x00007F7FAD2B5F35 [libj9jit29.so+0x7daf35])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0xc2f (0x00007F7FAD2B62FF [libj9jit29.so+0x7db2ff])
_ZN3OMR9Optimizer8optimizeEv+0x1c3 (0x00007F7FAD2B7CA3 [libj9jit29.so+0x7dcca3])
_ZN3OMR11Compilation7compileEv+0xa65 (0x00007F7FAD0A1A15 [libj9jit29.so+0x5c6a15])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x4c3 (0x00007F7FACC5B203 [libj9jit29.so+0x180203])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x3a9 (0x00007F7FACC5C249 [libj9jit29.so+0x181249])
omrsig_protect+0x241 (0x00007F7FADCE5451 [libj9prt29.so+0x2a451])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x395 (0x00007F7FACC59D45 [libj9jit29.so+0x17ed45])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x12c (0x00007F7FACC5A0AC [libj9jit29.so+0x17f0ac])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x377 (0x00007F7FACC58F77 [libj9jit29.so+0x17df77])
_ZN2TR24CompilationInfoPerThread3runEv+0x42 (0x00007F7FACC592E2 [libj9jit29.so+0x17e2e2])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x86 (0x00007F7FACC59396 [libj9jit29.so+0x17e396])
omrsig_protect+0x241 (0x00007F7FADCE5451 [libj9prt29.so+0x2a451])
_Z21compilationThreadProcPv+0x17f (0x00007F7FACC5976F [libj9jit29.so+0x17e76f])
thread_wrapper+0x187 (0x00007F7FADE25C07 [libj9thr29.so+0xbc07])
 (0x00007F7FAE20FAC3 [libc.so.6+0x94ac3])
 (0x00007F7FAE2A1850 [libc.so.6+0x126850])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2025/03/19 08:44:53 - please wait.
JVMDUMP032I JVM requested System dump using '/root/PRs/21417/CrashWithEcj/core.20250319.084453.111454.0001.dmp' in response to an event
JVMDUMP010I System dump written to /root/PRs/21417/CrashWithEcj/core.20250319.084453.111454.0001.dmp
JVMDUMP032I JVM requested Java dump using '/root/PRs/21417/CrashWithEcj/javacore.20250319.084453.111454.0002.txt' in response to an event
JVMDUMP010I Java dump written to /root/PRs/21417/CrashWithEcj/javacore.20250319.084453.111454.0002.txt
JVMDUMP032I JVM requested Snap dump using '/root/PRs/21417/CrashWithEcj/Snap.20250319.084453.111454.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /root/PRs/21417/CrashWithEcj/Snap.20250319.084453.111454.0003.trc
JVMDUMP032I JVM requested JIT dump using '/root/PRs/21417/CrashWithEcj/jitdump.20250319.084453.111454.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'JIT Compilation Thread-000' thread 0x0000000000025C00
JVMDUMP049I JIT dump notified all waiting threads of the current method to be compiled
JVMDUMP054I JIT dump is tracing the IL of the method on the crashed compilation thread
JVMDUMP048I JIT dump method being compiled is an ordinary method
JVMDUMP053I JIT dump is recompiling Test.func()F
Assertion failed at /root/CCM/JDK21/openj9-openjdk-jdk21/openj9/runtime/compiler/optimizer/IdiomRecognition.cpp:1536: trNode->getSymbol()->isStatic()
VMState: 0x000544ff
        Node 0x7f7f8c7c2d20 [astore]: direct store to non-auto, non-static
compiling Test.func()F at level: hot

@hangshao0
Copy link
Contributor

FYI @hzongaro

@keithc-ca
Copy link
Contributor

Perhaps the significant difference in the bytecode is in the catch block.

javac produces code which stores the ArithmeticException:

    41  astore 4
...
      Exception Table:
        [pc: 25, pc: 38] -> 41 when : java.lang.ArithmeticException

The compiler built into the eclipse IDE does essentially the same thing.
However, the code produced by ECJ pops the exception:

    35  pop
      Exception Table:
        [pc: 19, pc: 32] -> 35 when : java.lang.ArithmeticException

Both approaches seem entirely reasonable to me.

@hzongaro
Copy link
Member

Thanks, @keithc-ca! Yes, it does seem that that’s the source of the problem.

n13n      BBStart <block_11> (freq 6) (catches java/lang/ArithmeticException)                 [0x7f5e9e2f5c50] bci=[-1,35,13] rc=0 vc=3 vn=- li=- udi=- nc=0
n89n      asynccheck  jitCheckAsyncMessages[#23  helper Method] [flags 0x400 0x0 ]            [0x7f5e9e371b90] bci=[-1,35,13] rc=0 vc=4 vn=- li=- udi=- nc=0
n90n      astore  <pending push temp 0>[#426  Auto] [flags 0x7 0x800 ]                        [0x7f5e9e371be0] bci=[-1,35,13] rc=0 vc=4 vn=- li=- udi=- nc=1
n88n        aload  ExceptionMeta[#344  MethodMeta +64] [flags 0x207 0x0 ]                     [0x7f5e9e371b40] bci=[-1,35,13] rc=1 vc=4 vn=- li=- udi=- nc=0
n92n      astore  <temp slot 5>[#427  Auto] [flags 0x7 0x0 ]                                  [0x7f5e9e371c80] bci=[-1,35,13] rc=0 vc=4 vn=- li=- udi=- nc=1
n91n        aload  ExceptionMeta[#344  MethodMeta +64] [flags 0x207 0x0 ]                     [0x7f5e9e371c30] bci=[-1,35,13] rc=1 vc=4 vn=- li=- udi=- nc=0
n94n      astore  ExceptionMeta[#344  MethodMeta +64] [flags 0x207 0x0 ]                      [0x7f5e9e371d20] bci=[-1,35,13] rc=0 vc=4 vn=- li=- udi=- nc=1
n93n        aconst NULL (X==0 )                                                               [0x7f5e9e371cd0] bci=[-1,35,13] rc=1 vc=4 vn=- li=- udi=- nc=0 flg=0x2
n96n      goto --> block_6 BBStart at n5n                                                     [0x7f5e9e371dc0] bci=[-1,36,9] rc=0 vc=4 vn=- li=- udi=- nc=0
n14n      BBEnd </block_11> =====                                                             [0x7f5e9e2f5ca0] bci=[-1,36,9] rc=0 vc=4 vn=- li=- udi=- nc=0

…

#344:  ExceptionMeta[ MethodMeta +64] [flags 0x207 0x0 ] [0x7f5e9e361f70] (Address)

@hzongaro
Copy link
Member

@a7ehuo, may I ask you to take a look at this problem?

@jdmpapin
Copy link
Contributor

For a bit of context, the store of null to ExceptionMeta is there so that we won't hold the exception object live until the next throw, which could be indefinitely. Most of the time it should be no problem to keep a reference to the exception object, but very rarely it's possible that it could be the only way to reach a sizable set of objects, or it might prevent a finalizer or cleaner from running. Of course, there are no guarantees on when an object will be collected, but it makes sense to try to get out of the way and allow collection when it's easy to do so. I don't know why IL gen generates this store only when the first instruction is not astore (apparently including for some reason when it's astore_0, astore_1, etc.). The rationale seems to me to apply regardless of the bytecode in the block.

VP can generate a similar store when it does a throw-to-goto transformation since eclipse-omr/omr#392. The previous way of transforming throw to goto relied on the presence of an astore at the start of the catch block, and it made some dubious assumptions about the code in the exception handler. Using a store to ExceptionMeta made it "just work."

(Aside: From digging in ancient history, IL gen creates a temp because at one point there was a bug that allowed ExceptionMeta to be clobbered before it was loaded. I think we could probably get rid of the temp and just push ExceptionMeta and then overwrite it with null. The aliasing looks to be correct for calls already. It might need to be adjusted for other exception points, but maybe that needs to be done anyway.)

I suppose my point is that I think it makes sense if possible to treat such stores as valid rather than to try to get rid of them, in case the latter was being considered.

@a7ehuo
Copy link
Contributor

a7ehuo commented Mar 31, 2025

@jdmpapin Thank you very much for the clarification!

(1)

I think we could probably get rid of the temp and just push ExceptionMeta and then overwrite it with null. The aliasing looks to be correct for calls already. It might need to be adjusted for other exception points, but maybe that needs to be done anyway.)

Could you elaborate more on this and why should it be changed this way here? Do you mean getting rid of temp #429 (astore n92n) here? Before processing the case of the bytecode being not J9BCastore, n91n aload ExceptionMeta is on the stack. Do you mean pushing another load or a store of ExceptionMeta on the stack? I'm not sure I follow.

n13n      BBStart <block_11> (catches java/lang/ArithmeticException)                          [0x7fb178d19ca0] bci=[-1,35,13] rc=0 vc=0 vn=- li=- udi=- nc=0
n89n      asynccheck  jitCheckAsyncMessages[#23  helper Method] [flags 0x400 0x0 ]            [0x7fb178d95b90] bci=[-1,35,13] rc=0 vc=0 vn=- li=- udi=- nc=0
n90n      astore  <pending push temp 0>[#428  Auto] [flags 0x7 0x800 ]                        [0x7fb178d95be0] bci=[-1,35,13] rc=0 vc=0 vn=- li=- udi=- nc=1
n88n        aload  ExceptionMeta[#345  MethodMeta +64] [flags 0x207 0x0 ]                     [0x7fb178d95b40] bci=[-1,35,13] rc=1 vc=0 vn=- li=- udi=- nc=0
n92n      astore  <temp slot 5>[#429  Auto] [flags 0x7 0x0 ]                                  [0x7fb178d95c80] bci=[-1,35,13] rc=0 vc=0 vn=- li=- udi=- nc=1
n91n        aload  ExceptionMeta[#345  MethodMeta +64] [flags 0x207 0x0 ]                     [0x7fb178d95c30] bci=[-1,35,13] rc=1 vc=0 vn=- li=- udi=- nc=0
n94n      astore  ExceptionMeta[#345  MethodMeta +64] [flags 0x207 0x0 ]                      [0x7fb178d95d20] bci=[-1,35,13] rc=0 vc=0 vn=- li=- udi=- nc=1
n93n        aconst NULL (X==0 )                                                               [0x7fb178d95cd0] bci=[-1,35,13] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x2
n96n      goto --> block_6 BBStart at n5n                                                     [0x7fb178d95dc0] bci=[-1,36,9] rc=0 vc=0 vn=- li=- udi=- nc=0
n14n      BBEnd </block_11> ===== 

(2)
Since storing NULL into ExceptionMeta in ILGen is intended if I understand it correctly, I will look at if the assert in TR_CISCGraph makes sense or if it should make an exception for ExceptionMeta.

@jdmpapin
Copy link
Contributor

Since storing NULL into ExceptionMeta in ILGen is intended if I understand it correctly, I will look at if the assert in TR_CISCGraph makes sense or if it should make an exception for ExceptionMeta.

Thanks 👍

I think we could probably get rid of the temp and just push ExceptionMeta and then overwrite it with null. The aliasing looks to be correct for calls already. It might need to be adjusted for other exception points, but maybe that needs to be done anyway.)

Could you elaborate more on this and why should it be changed this way here?

First please note that this was an aside. It's not a suggestion for how to deal with this issue. It's just a tangentially related thought.

Anyway, it would simplify the logic, avoid unnecessary temps, get rid of the inconsistency between astore and astore_n, and make sure that ExceptionMeta is always cleared regardless of the bytecode.

I think it will be most obvious if you think about the overall effect of what IL gen does at the start of the handler before it starts walking its bytecode. Right now there are two cases. If the first bytecode instruction is astore, then we just push aload ExceptionMeta. If OTOH the first bytecode instruction is anything else, we instead store ExceptionMeta into a temp, then set ExceptionMeta to null, then push a load of the temp. My suggestion is to ignore the first bytecode instruction and have only one case. We could push a load of ExceptionMeta, then set it to null, and that's it. We'd get these two trees:

treetop
  aload ExceptionMeta
astore ExceptionMeta
  aconst NULL

and the aload would be on the stack, and after that it shouldn't matter what bytecode follows. If the first bytecode instruction happens to be astore, then the difference would be that we would null out ExceptionMeta, which currently doesn't happen in that case. And that's the main point of the suggestion. If OTOH it happens not to be astore, then the difference would be that we avoid the temp, which I don't think we actually need.

@a7ehuo
Copy link
Contributor

a7ehuo commented Mar 31, 2025

Thanks for taking the time to clarify my questions @jdmpapin! Now I understand the proposed suggestion in ILGen is to have the consistence of clearing ExceptionMeta regardless of if the first byte code is astore or not.

For example, if the first bytecode is JBpop:

astore  <temp slot 5>
  aload  ExceptionMeta
astore  ExceptionMeta
  aconst NULL

==> after the change 

treetop
  aload ExceptionMeta
astore ExceptionMeta
  aconst NULL

If the first bytecode is JBStore:

astore  <auto slot 4>
  aload  ExceptionMeta

==> after the change 

treetop
  aload ExceptionMeta
astore ExceptionMeta
  aconst NULL
astore  <auto slot 4>
  ==> aload  ExceptionMeta

@jdmpapin
Copy link
Contributor

jdmpapin commented Mar 31, 2025

Right. I've just made a small edit to your comment to show the commoning in the last trees snippet

(Well, I had. It seems you've undone my edit. But in the last snippet, the second occurrence of aload ExceptionMeta should be commoned.)

@a7ehuo
Copy link
Contributor

a7ehuo commented Mar 31, 2025

It seems you've undone my edit. But in the last snippet, the second occurrence of aload ExceptionMeta should be commoned

Whoops, sorry, I thought I copy/pasted the commoning "by mistake" in my original post 😅. Just added it back

@a7ehuo
Copy link
Contributor

a7ehuo commented Apr 1, 2025

After looking at the code around the fatal assert, I think MethodMetaData should be treated the same as static so that storing into MethodMetaData won't be considered negligible. It should be exempted from the assert.

TR_ASSERT_FATAL_WITH_NODE(
trNode,
trNode->getSymbol()->isStatic(),
"direct store to non-auto, non-static");
// Regardless of the uses that we can see within this method (which
// may not even appear, since we won't necessarily get use-def
// indices for static accesses), it's always possible that the
// destination static is used after the loop, but outside of this
// method. Recognizing this possibility will prevent the store from
// being considered negligible and ultimately incorrectly removed.

            TR_ASSERT_FATAL_WITH_NODE(
               trNode,
               trNode->getSymbol()->isStatic() || trNode->getSymbol()->isMethodMetaData(),
               "direct store to non-auto, non-static, non-MethodMetaData");

@jdmpapin
Copy link
Contributor

jdmpapin commented Apr 1, 2025

Interesting. Is IR considering transforming a loop that contains an exception handler?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants