perf: Cache jstrings during metrics collection #1029

mbutrovich · 2024-10-22T22:00:55Z

Which issue does this PR close?

Partially addresses #1024.

Rationale for this change

Comet uses JNI jstrings as the keys to updating metrics values on the Spark side during execution. As described in #1024, currently Comet allocates a jstring for every metric for every invocation of metrics updating. The calls to jni_NewStringUTF account for over 1% of the on-CPU time in TPC-H SF10 for me.

What changes are included in this PR?

Added a HashMap that maps the native string to a jstring to use in JNI calls. This has the benefit of being many-to-one, whereby multiple nodes with the same metric name will benefit from the cached jstring. This cache is populated on demand: if the entry isn't present, we allocate a jstring and insert it into the cache.

I have some thoughts about this approach that I would love for reviewers to comment on:

I wanted to populate the cache in advance, maybe do a plan traversal when it's generated in ExecutePlan. However, DF's metrics are Options and don't actually appear to be there until the plan starts executing.
What is the thread safety of this approach? It's unclear to me if multiple threads could be sharing this call stack and trying to write new values into the cache at the same time. I could wrap the HashMap in a latch in exchange for a performance hit, but would like to understand if this is even possible.
I think I understood the jni crate's docs with respect to GlobalRef, but a sanity check on if this approach could hold references longer than we want (and leak) would be helpful.

How are these changes tested?

Existing tests on the Java side that exercise metrics.

…locations. I need to confirm 1) there's actually a performance benefit to this, and 2) these GlobalRefs are being released when I want them to be.

mbutrovich · 2024-10-22T22:02:38Z

I will update with some benchmark results tomorrow, but initial results look promising.

native/core/src/execution/metrics/utils.rs

andygrove · 2024-10-23T23:43:31Z

I ran some benchmarks locally and confirmed a speedup:

andygrove · 2024-10-23T23:45:23Z

The speedup on q4 is pretty impressive!

Here are the raw JSON benchmark result files:

baseline.json

pr1029.json

native/core/src/execution/jni_api.rs

mbutrovich · 2024-10-24T12:40:18Z

Thanks for running the benchmarks for me. I was struggling to get reproducible results locally.

andygrove · 2024-10-24T15:47:33Z

~~I wonder why there is such a large regression with q72 though~~

edit: posted the wrong pngs from the wrong benchmark - updated now

andygrove · 2024-10-24T17:07:48Z

I think I understood the jni crate's docs with respect to GlobalRef, but a sanity check on if this approach could hold references longer than we want (and leak) would be helpful.

This seems correct to me

andygrove · 2024-10-24T17:50:52Z

I had earlier posted fresh benchmarks that showed a big improvement with the latest commit but I had inadvertently enabled the new replaceSortMergeJoin feature. I ran again without that enabled and essentially see the same results as the original run (367 seconds versus the earlier 365, which is likely just noise).

andygrove · 2024-10-24T18:58:58Z

What is the thread safety of this approach? It's unclear to me if multiple threads could be sharing this call stack and trying to write new values into the cache at the same time. I could wrap the HashMap in a latch in exchange for a performance hit, but would like to understand if this is even possible.

Spark has a single thread calling CometExecIterator, which in turn calls createPlan, executePlan, and releasePlan, so think the current approach is safe.

andygrove

LGTM. Thanks @mbutrovich

mbutrovich and others added 5 commits October 21, 2024 15:38

Attempt at caching Jstrings as GlobalRefs in a HashMap to reduce real…

a6b29ef

…locations. I need to confirm 1) there's actually a performance benefit to this, and 2) these GlobalRefs are being released when I want them to be.

Minor refactor and added more docs.

e7366d6

Undo import reordering to reduce diff.

fc4d4c6

Docs.

4003fcd

Merge branch 'apache:main' into cache_jstrings

bbe1669

andygrove reviewed Oct 22, 2024

View reviewed changes

native/core/src/execution/metrics/utils.rs Outdated Show resolved Hide resolved

Avoid get() by just cloning the Arc to globalref on insert.

40c1b02

andygrove reviewed Oct 24, 2024

View reviewed changes

native/core/src/execution/jni_api.rs Outdated Show resolved Hide resolved

Store jstring cache in ExecutionContext.

40c3191

andygrove approved these changes Oct 24, 2024

View reviewed changes

mbutrovich mentioned this pull request Oct 25, 2024

Reduce metrics collection overhead #1024

Open

andygrove merged commit d670af7 into apache:main Oct 26, 2024
75 checks passed

mbutrovich deleted the cache_jstrings branch October 28, 2024 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Cache jstrings during metrics collection #1029

perf: Cache jstrings during metrics collection #1029

mbutrovich commented Oct 22, 2024 •

edited

Loading

mbutrovich commented Oct 22, 2024

andygrove commented Oct 23, 2024

andygrove commented Oct 23, 2024

mbutrovich commented Oct 24, 2024

andygrove commented Oct 24, 2024 •

edited

Loading

andygrove commented Oct 24, 2024

andygrove commented Oct 24, 2024

andygrove commented Oct 24, 2024

andygrove left a comment

perf: Cache jstrings during metrics collection #1029

perf: Cache jstrings during metrics collection #1029

Conversation

mbutrovich commented Oct 22, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

mbutrovich commented Oct 22, 2024

andygrove commented Oct 23, 2024

andygrove commented Oct 23, 2024

mbutrovich commented Oct 24, 2024

andygrove commented Oct 24, 2024 • edited Loading

andygrove commented Oct 24, 2024

andygrove commented Oct 24, 2024

andygrove commented Oct 24, 2024

andygrove left a comment

Choose a reason for hiding this comment

mbutrovich commented Oct 22, 2024 •

edited

Loading

andygrove commented Oct 24, 2024 •

edited

Loading