Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYSTEMDS-3534 Cache for build phase in bin and recode encoder #2036

Open
wants to merge 99 commits into
base: main
Choose a base branch
from

Conversation

Mayaryin
Copy link

We implemented a caffeine cache that is currently used by the bin and recode encoders. We tested and compared the runtime with and without caching in a seperate test class. We separate a first run of encoding which initializes the cache and takes longer from the rest of our measurements. We average over 10 runs for each setting.

@Mayaryin Mayaryin marked this pull request as ready for review June 25, 2024 19:52
@phaniarnab
Copy link
Contributor

Thanks for the changes, @Mayaryin and @ingunnaf. I will have a detailed look in the next days.
From the quick look, it is not the right idea to use an external library for caching. We want to build our own caching and cache management techniques for encoder intermediates. That would allow for a holistic integration to SystemDS' compiler and runtime infrastructure.

@codecov-commenter
Copy link

codecov-commenter commented Jul 4, 2024

Codecov Report

Attention: Patch coverage is 84.26396% with 31 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@f3e0926). Learn more about missing BASE report.

Files Patch % Lines
...sds/runtime/transform/encode/EncodeBuildCache.java 86.84% 7 Missing and 3 partials ⚠️
.../sysds/runtime/transform/encode/BinBoundaries.java 68.42% 4 Missing and 2 partials ⚠️
...sysds/runtime/transform/encode/EncodeCacheKey.java 70.00% 3 Missing and 3 partials ⚠️
...sds/runtime/transform/encode/EncodeCacheEntry.java 66.66% 4 Missing and 1 partial ⚠️
...sds/runtime/transform/encode/ColumnEncoderBin.java 91.66% 1 Missing and 1 partial ⚠️
.../apache/sysds/runtime/matrix/data/MatrixBlock.java 0.00% 1 Missing ⚠️
.../apache/sysds/runtime/transform/encode/RCDMap.java 85.71% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2036   +/-   ##
=======================================
  Coverage        ?   68.84%           
  Complexity      ?    40755           
=======================================
  Files           ?     1447           
  Lines           ?   161755           
  Branches        ?    31432           
=======================================
  Hits            ?   111363           
  Misses          ?    41315           
  Partials        ?     9077           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Mayaryin
Copy link
Author

Mayaryin commented Jul 8, 2024

We added a unit test and a tests for multithreading. We attempted to test the cache via a dml script and a test dataset, but couldnt succeed. Our test results from the inner tests as well as the test data generation script and the dml scripts can be found in this repository: https://github.com/Mayaryin/systemds_test_data

@phaniarnab
Copy link
Contributor

We added a unit test and a tests for multithreading. We attempted to test the cache via a dml script and a test dataset, but couldnt succeed. Our test results from the inner tests as well as the test data generation script and the dml scripts can be found in this repository: https://github.com/Mayaryin/systemds_test_data

Thanks @Mayaryin. I will take a look. Can you please summarize the performance comparison of single-threaded, multi-threaded, with and without cache in this PR to track it in the same place? You can maintain the scripts and setups in the other repository, but just mention the speedups here.

@phaniarnab
Copy link
Contributor

Also, please ensure that all tests pass.

@Mayaryin
Copy link
Author

Mayaryin commented Jul 8, 2024

I am unsure which tests exactly fail. When building locally the build fails due to javadoc warnings, this can be avoided by adding false in the javadoc plugin configuration in the POM. When looking at the warnings it seems that there are comments missing in many classes, but not restricted to the ones we added. Could we have a look at it together in a call this week?

@phaniarnab
Copy link
Contributor

I am unsure which tests exactly fail. When building locally the build fails due to javadoc warnings, this can be avoided by adding false in the javadoc plugin configuration in the POM. When looking at the warnings it seems that there are comments missing in many classes, but not restricted to the ones we added. Could we have a look at it together in a call this week?

Looks like the builtin.part2 is failing. You can see the back trace if you click Details. Take a look and try to reproduce the failure locally by running the same test. If still unsure, we can have a call later this week. Btw, I reran the tests to avoid any intermittent issues.

@Mayaryin
Copy link
Author

Mayaryin commented Jul 8, 2024

Test results

Testing only the build phase: averaging over 60 runs (columns) minus the first 10, 10000 rows
cache is explicitly disabled/enabled
RECODE no cache: 2,104 ms
RECODE cache: 1,022 ms
BIN no cache: 0,127 ms
BIN cache: 0,0825 ms

Multithreading: averaging over 60 runs (columns) minus the first 10, 10000 rows
cache is not explicitly disabled, the execution times refer to the whole encode process including build and apply
image

@Mayaryin
Copy link
Author

Mayaryin commented Jul 8, 2024

I've seen that there is a problem with the dedup function but this also occurs in at least one other PR

@phaniarnab
Copy link
Contributor

All of the tests have passed now.

@phaniarnab
Copy link
Contributor

Thank you, @Mayaryin, @ingunnaf for your contribution.
With some changes, I see a 2x speedup for real use cases, which is very good.
I will not merge this PR immediately, as the changes are in the critical path of transformencode and may impact other running projects. However, after improving the robustness of this feature, I will merge it before the next release.

List of TODOs include:

  • Integrate the lineage trace of the input frame into the key of the metadata cache, either just by adding the hash/checksum of the lineage trace or by making the build tasks lineage traceable. This extension will avoid incorrect reuse if the input frame is modified.
  • The number of bins may need to be added to the key to avoid incorrect reuse for different number of bins
  • Robustness of the hash function.

Future work outside the scope of this PR:

  • Caching and reuse apply task results, which requires effective output allocation strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants