-
Notifications
You must be signed in to change notification settings - Fork 122
unmarshaller performance: use bulk operations instead of charAt() where possible #1873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
unmarshaller performance: use bulk operations instead of charAt() where possible #1873
Conversation
laurentschoelens
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe simplify code for limiting instanceof ?
| else if (pcdata instanceof StringBuilder) { | ||
| ((StringBuilder) pcdata).getChars(0, len, buf, 0); | ||
| } | ||
| else if (pcdata instanceof StringBuffer) { | ||
| ((StringBuffer) pcdata).getChars(0, len, buf, 0); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use instanceof AbstractStringBuilder since it's extented by both StringBuilder and StringBuffer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to but that class is not public...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oups, yes you're right :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CharSequence is what you need, implemented by String, StringBuilder, StringBuffer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but
- it's already method argument and type of pcdata
- it doesn't bring optimized method
getChars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. getChars is there since JDK 25 only. I did miss that.
|
@winfriedgerlach : you should also sign eclipsefnd/eca for this to be merged by owners of jaxb-ri repository |
| ((StringBuffer) pcdata).getChars(0, len, buf, 0); | ||
| } | ||
| else if (pcdata instanceof Pcdata) { | ||
| ((Pcdata) pcdata).writeTo(buf, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the implementation, this should give massive gains for Base64Data, but IntArrayData and IntData also benefit
laurentschoelens
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@laurentschoelens done |
|
That would be interesting to have benchmark on this, even a small test case, to see profiling results on actual 4.0.6 and patched one |
47e3079 to
caf8ed2
Compare
@laurentschoelens you are completely right, the only problem was that changing the code takes five minutes and benchmarking done right takes hours... So I benchmarked quite a lot at the weekend at it turns out that this change does not move the needle at all in the grand scheme of things for our use-case (parsing lots of XML files with SAX and XML Schema validation). I tried Java 21, 25, and 17, and the results were all within the margin of error. I didn't test the Base64 use-case though. I am fine if you close this PR as "premature optimization". I can also investigate further if this helps anyone. |
|
No I think that should be better optimized code than actual one. What did you look at ? Run time ? Cpu ? Memory ? |
I wrote a JMH test that parsed ~5,000 XML documents from memory and looked at the "operations per second" metric. I tried with both JDK default and Woodstox parser (the latter is ~2x faster...). I fully agree that the code change proposed in this PR will make things faster, the question is only whether performance in this area is relevant in the overall parse-and-validate process. I can do some additional benchmark runs tonight where I only look at the changed code section in isolation. |
|
Could you share the benchmark code you made ? |
|
@laurentschoelens OK, here are some micro benchmarks that really only test the Things to note:
IsolatedString.java |
|
About IntArrayData, from what I see
public void writeTo(char[] buf, int start) {
toString().getChars(0,length(),buf,start);
}
@Override
public String toString() {
return literal.toString();
}
Maybe you pointed out a bug but I guess the |
|
I'll try to look further at the rest of your code but that's good news for performance overhead (us are still us gained 😄) |
|
@laurentschoelens good finding, I would definitely prefer Note that there are no usages of |
|
If you didn't call the I guess having @Override
public char charAt(int index) {
return getLiteral().charAt(index);
} @Override
public char charAt(int index) {
checkIndex(index, count);
if (isLatin1()) {
return (char)(value[index] & 0xff);
}
return StringUTF16.charAt(value, index);
} |

assuming the minimum Java version 11 is correct (as mentioned in the 4.0.3 release notes), this should be a viable performance improvement for the very common case of processing a String event from the SAX parser