Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Cache jstrings during metrics collection #1029

Merged
merged 7 commits into from
Oct 26, 2024

Conversation

mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Oct 22, 2024

Which issue does this PR close?

Partially addresses #1024.

Rationale for this change

Comet uses JNI jstrings as the keys to updating metrics values on the Spark side during execution. As described in #1024, currently Comet allocates a jstring for every metric for every invocation of metrics updating. The calls to jni_NewStringUTF account for over 1% of the on-CPU time in TPC-H SF10 for me.

What changes are included in this PR?

Added a HashMap that maps the native string to a jstring to use in JNI calls. This has the benefit of being many-to-one, whereby multiple nodes with the same metric name will benefit from the cached jstring. This cache is populated on demand: if the entry isn't present, we allocate a jstring and insert it into the cache.

I have some thoughts about this approach that I would love for reviewers to comment on:

  1. I wanted to populate the cache in advance, maybe do a plan traversal when it's generated in ExecutePlan. However, DF's metrics are Options and don't actually appear to be there until the plan starts executing.
  2. What is the thread safety of this approach? It's unclear to me if multiple threads could be sharing this call stack and trying to write new values into the cache at the same time. I could wrap the HashMap in a latch in exchange for a performance hit, but would like to understand if this is even possible.
  3. I think I understood the jni crate's docs with respect to GlobalRef, but a sanity check on if this approach could hold references longer than we want (and leak) would be helpful.

How are these changes tested?

Existing tests on the Java side that exercise metrics.

mbutrovich and others added 5 commits October 21, 2024 15:38
…locations. I need to confirm 1) there's actually a performance benefit to this, and 2) these GlobalRefs are being released when I want them to be.
@mbutrovich
Copy link
Contributor Author

I will update with some benchmark results tomorrow, but initial results look promising.

@andygrove
Copy link
Member

I ran some benchmarks locally and confirmed a speedup:

tpch_allqueries

tpch_queries_speedup

@andygrove
Copy link
Member

The speedup on q4 is pretty impressive!

Here are the raw JSON benchmark result files:

baseline.json

pr1029.json

@mbutrovich
Copy link
Contributor Author

Thanks for running the benchmarks for me. I was struggling to get reproducible results locally.

@andygrove
Copy link
Member

andygrove commented Oct 24, 2024

I wonder why there is such a large regression with q72 though

edit: posted the wrong pngs from the wrong benchmark - updated now

@andygrove
Copy link
Member

  1. I think I understood the jni crate's docs with respect to GlobalRef, but a sanity check on if this approach could hold references longer than we want (and leak) would be helpful.

This seems correct to me

@andygrove
Copy link
Member

I had earlier posted fresh benchmarks that showed a big improvement with the latest commit but I had inadvertently enabled the new replaceSortMergeJoin feature. I ran again without that enabled and essentially see the same results as the original run (367 seconds versus the earlier 365, which is likely just noise).

@andygrove
Copy link
Member

  1. What is the thread safety of this approach? It's unclear to me if multiple threads could be sharing this call stack and trying to write new values into the cache at the same time. I could wrap the HashMap in a latch in exchange for a performance hit, but would like to understand if this is even possible.

Spark has a single thread calling CometExecIterator, which in turn calls createPlan, executePlan, and releasePlan, so think the current approach is safe.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @mbutrovich

@andygrove andygrove merged commit d670af7 into apache:main Oct 26, 2024
75 checks passed
@mbutrovich mbutrovich deleted the cache_jstrings branch October 28, 2024 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants