unmarshaller performance: use bulk operations instead of charAt() where possible #1873

winfriedgerlach · 2025-11-07T13:11:19Z

assuming the minimum Java version 11 is correct (as mentioned in the 4.0.3 release notes), this should be a viable performance improvement for the very common case of processing a String event from the SAX parser

laurentschoelens

Maybe simplify code for limiting instanceof ?

laurentschoelens · 2025-11-07T13:25:13Z

...src/main/java/org/glassfish/jaxb/runtime/v2/runtime/unmarshaller/ValidatingUnmarshaller.java

+        else if (pcdata instanceof StringBuilder) {
+            ((StringBuilder) pcdata).getChars(0, len, buf, 0);
+        }
+        else if (pcdata instanceof StringBuffer) {
+            ((StringBuffer) pcdata).getChars(0, len, buf, 0);
+        }


Could you use instanceof AbstractStringBuilder since it's extented by both StringBuilder and StringBuffer ?

would like to but that class is not public...

oups, yes you're right :)

CharSequence is what you need, implemented by String, StringBuilder, StringBuffer...

Yes but

it's already method argument and type of pcdata

it doesn't bring optimized method getChars

Ah, sorry. getChars is there since JDK 25 only. I did miss that.

laurentschoelens · 2025-11-07T13:27:40Z

@winfriedgerlach : you should also sign eclipsefnd/eca for this to be merged by owners of jaxb-ri repository

winfriedgerlach · 2025-11-07T15:20:15Z

...src/main/java/org/glassfish/jaxb/runtime/v2/runtime/unmarshaller/ValidatingUnmarshaller.java

+            ((StringBuffer) pcdata).getChars(0, len, buf, 0);
+        }
+        else if (pcdata instanceof Pcdata) {
+            ((Pcdata) pcdata).writeTo(buf, 0);


looking at the implementation, this should give massive gains for Base64Data, but IntArrayData and IntData also benefit

laurentschoelens

LGTM

winfriedgerlach · 2025-11-08T12:07:54Z

@winfriedgerlach : you should also sign eclipsefnd/eca for this to be merged by owners of jaxb-ri repository

@laurentschoelens done

laurentschoelens · 2025-11-08T15:55:05Z

That would be interesting to have benchmark on this, even a small test case, to see profiling results on actual 4.0.6 and patched one

winfriedgerlach · 2025-11-10T08:01:43Z

That would be interesting to have benchmark on this, even a small test case, to see profiling results on actual 4.0.6 and patched one

@laurentschoelens you are completely right, the only problem was that changing the code takes five minutes and benchmarking done right takes hours... So I benchmarked quite a lot at the weekend at it turns out that this change does not move the needle at all in the grand scheme of things for our use-case (parsing lots of XML files with SAX and XML Schema validation). I tried Java 21, 25, and 17, and the results were all within the margin of error.

I didn't test the Base64 use-case though.

I am fine if you close this PR as "premature optimization". I can also investigate further if this helps anyone.

laurentschoelens · 2025-11-10T10:13:26Z

No I think that should be better optimized code than actual one. What did you look at ? Run time ? Cpu ? Memory ?

winfriedgerlach · 2025-11-10T12:29:55Z

No I think that should be better optimized code than actual one. What did you look at ? Run time ? Cpu ? Memory ?

I wrote a JMH test that parsed ~5,000 XML documents from memory and looked at the "operations per second" metric. I tried with both JDK default and Woodstox parser (the latter is ~2x faster...). I fully agree that the code change proposed in this PR will make things faster, the question is only whether performance in this area is relevant in the overall parse-and-validate process.

I can do some additional benchmark runs tonight where I only look at the changed code section in isolation.

laurentschoelens · 2025-11-10T17:57:04Z

Could you share the benchmark code you made ?

winfriedgerlach · 2025-11-10T21:46:41Z

@laurentschoelens OK, here are some micro benchmarks that really only test the for ... charAt() vs. getChars()/writeTo() part. All tests have been performed on a Ryzen 7700 desktop under Windows 11 and temurin 21.0.5+11-LTS.

100_000 Strings of length 10: speedup ~1.85
Benchmark                 Mode  Cnt     Score     Error  Units
Isolated.stringCharAt    thrpt   15  1108,888 ±  51,021  ops/s
Isolated.stringGetChars  thrpt   15  2063,215 ± 116,811  ops/s

100_000 String*Builders* of length 10: speedup ~1.7
Benchmark                        Mode  Cnt     Score    Error  Units
IsolatedStringBuilder.charAt    thrpt   15  1304,567 ± 56,170  ops/s
IsolatedStringBuilder.getChars  thrpt   15  2231,755 ± 67,601  ops/s

100_000 IntData of random 0..999_999: speedup ~4.5
Benchmark                         Mode  Cnt     Score    Error  Units
IsolatedIntData.intDataCharAt    thrpt   15   223,689 ±  0,836  ops/s
IsolatedIntData.intDataGetChars  thrpt   15  1020,029 ± 21,783  ops/s

1 IntArrayData with 10 ints of random 0..999_999: speedup ~1.1
Benchmark                              Mode  Cnt         Score         Error  Units
IsolatedIntArrayData.intArrayCharAt   thrpt   15  85888294,441 ± 2359727,336  ops/s
IsolatedIntArrayData.intArrayWriteTo  thrpt   15  93751403,476 ±  772306,405  ops/s
100_000 IntArrayData with each containing 10 random ints in the range of 0..999_999: speedup ~11
IsolatedIntArrayData.intArrayCharAt   thrpt   15   49,950 ±  0,590  ops/s
IsolatedIntArrayData.intArrayWriteTo  thrpt   15  537,718 ± 14,146  ops/s

100_000 Base64Data of 1000 bytes: speedup ~2.3
Benchmark                          Mode  Cnt   Score   Error  Units
IsolatedBase64Data.base64CharAt   thrpt   15   5,714 ± 0,312  ops/s
IsolatedBase64Data.base64WriteTo  thrpt   15  13,032 ± 0,223  ops/s
about the same result when going to 10_000 bytes length:
IsolatedBase64Data.base64CharAt   thrpt   15  0,590 ± 0,023  ops/s
IsolatedBase64Data.base64WriteTo  thrpt   15  1,366 ± 0,019  ops/s

Things to note:

IntData is so bad, because it doesn't cache its toString() calls (as opposed to IntArrayData.getLiteral())
I have serious doubts whether IntData is used at all due to blatant bugs in its sizeTable --> I was only able to conduct the benchmark with a patched version that calculates stringSizeOfInt(54321) = 5 instead of 6...
- see https://github.com/openjdk/jdk6/blob/3e49aa876353eaa215cde71eb21acc9b7f9872a0/jdk/src/share/classes/java/lang/Integer.java#L377 for a correct table (which was probably the "inspiration" for the code in IntData)
- note that newer Java versions feature a more efficient implementation
I do not really have an explanation why IntArrayData performs so badly. I couldn't find a reason in the source code that would explain such a massive speedup --> maybe I made a mistake? But again, is IntArrayData even really used anywhere?

IsolatedString.java
IsolatedStringBuilder.java
IsolatedIntData.java
IntData.java
IsolatedIntArrayData.java
IsolatedBase64Data.java

laurentschoelens · 2025-11-11T14:46:16Z

About IntArrayData, from what I see

in extends PcData, calling writeTo(char[] buf, int start) will do the default impl of the method

    public void writeTo(char[] buf, int start) {
        toString().getChars(0,length(),buf,start);
    }

the toString() method in IntArrayData calls the following

@Override
    public String toString() {
        return literal.toString();
    }

sadly, if none of the length, charAt or subSequence is called before the toString, the literal var will be null.

Maybe you pointed out a bug but I guess the toString method should call getLiteral method instead of potential null variable.

laurentschoelens · 2025-11-11T14:47:03Z

I'll try to look further at the rest of your code but that's good news for performance overhead (us are still us gained 😄)

winfriedgerlach · 2025-11-11T15:14:40Z

@laurentschoelens good finding, I would definitely prefer getLiteral() there instead of direct field access!
But I'm not sure whether this fully explains the speedup, because I do call .length() on each IntArrayData in the setup() method of my JMH benchmark... (cf. IsolatedIntArrayData.java)

Note that there are no usages of IntArrayData in this repo, so we probably shouldn't put much effort in here...

laurentschoelens · 2025-11-11T15:24:16Z

If you didn't call the length method, it would definitively fail with NPE

I guess having charAt method calling getLiteral on each character and doing checkIndex would slow down this as much as observe :

    @Override
    public char charAt(int index) {
        return getLiteral().charAt(index);
    }

    @Override
    public char charAt(int index) {
        checkIndex(index, count);
        if (isLatin1()) {
            return (char)(value[index] & 0xff);
        }
        return StringUTF16.charAt(value, index);
    }

winfriedgerlach · 2025-11-14T16:48:17Z

I found this in the Woodstox source code regarding the same matter:

nice optimization!

laurentschoelens · 2025-11-14T21:09:46Z

I found this in the Woodstox source code regarding the same matter:

nice optimization!

I which context this is done ?
I mean, is it better to do charAt at some points and getChars for other here too ?

laurentschoelens reviewed Nov 7, 2025

View reviewed changes

winfriedgerlach changed the title ~~unmarshaller performance: use bulk getChars() where possible~~ unmarshaller performance: use bulk operations instead of charAt() where possible Nov 7, 2025

winfriedgerlach commented Nov 7, 2025

View reviewed changes

laurentschoelens approved these changes Nov 7, 2025

View reviewed changes

unmarshaller performance: use bulk operations where possible

caf8ed2

winfriedgerlach force-pushed the improve-perfomance-of-unmarshaller branch from 47e3079 to caf8ed2 Compare November 9, 2025 08:59

unmarshaller performance: use bulk operations instead of charAt() where possible #1873

Are you sure you want to change the base?

unmarshaller performance: use bulk operations instead of charAt() where possible #1873

Uh oh!

Conversation

winfriedgerlach commented Nov 7, 2025

Uh oh!

laurentschoelens left a comment

Choose a reason for hiding this comment

Uh oh!

laurentschoelens Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

winfriedgerlach Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

laurentschoelens Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

jansupol Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

laurentschoelens Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

jansupol Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

laurentschoelens commented Nov 7, 2025

Uh oh!

winfriedgerlach Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laurentschoelens left a comment

Choose a reason for hiding this comment

Uh oh!

winfriedgerlach commented Nov 8, 2025

Uh oh!

laurentschoelens commented Nov 8, 2025

Uh oh!

winfriedgerlach commented Nov 10, 2025

Uh oh!

laurentschoelens commented Nov 10, 2025

Uh oh!

winfriedgerlach commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurentschoelens commented Nov 10, 2025

Uh oh!

winfriedgerlach commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurentschoelens commented Nov 11, 2025

Uh oh!

laurentschoelens commented Nov 11, 2025

Uh oh!

winfriedgerlach commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurentschoelens commented Nov 11, 2025

Uh oh!

winfriedgerlach commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laurentschoelens commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

winfriedgerlach Nov 7, 2025 •

edited

Loading

winfriedgerlach commented Nov 10, 2025 •

edited

Loading

winfriedgerlach commented Nov 10, 2025 •

edited

Loading

winfriedgerlach commented Nov 11, 2025 •

edited

Loading

winfriedgerlach commented Nov 14, 2025 •

edited

Loading