New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adapt EigenDecomposition implementation #2038

Open

Aast12 wants to merge 6 commits into apache:main from Wouter-vw:naive-impl

Aast12 commented Jun 30, 2024 •

edited

Loading

Implements the Implicit QL Algorithm also used by the EigenDecomposition class using SystemDS data structures. Note this assumes the matrix is symmetric and fits in 1 block of a DenseBlock. The later is also assumed by the current computeEigen function so the only missed functionality is the support for non-symmetric matrices. The implementation focused on reducing as much allocations as possible to match and exceed the performance of the current eigen decomposition function across most cases.

As an initial benchmark, a test was run with Symmetric Positive Definite and tridiagonal (symmetric) matrices of sizes 100x100, 500x500 and 1000x1000. The following table shows the test cases and amount of iterations done per test.

Test case	Iterations
spdTest100	300
spdTest500	100
spdTest1000	30
tridiagonalTest100	300
tridiagonalTest500	100
tridiagonalTest1000	30

What follows are the results of the benchmark, showing an average improvement of 48.95 % across all cases with highest improvements in the largest matrices of 1000x1000 size, with up to a 83% improvement.

#	Instruction	Iterations	Time(s) Old	Time(s) New	Improvement
1	eigen	3060	631.230	423.791	48.95 %
2	tridiagonalTest500	500	201.269	143.769	40.00 %
3	spdTest500	500	185.826	138.117	34.54 %
4	tridiagonalTest1000	30	129.213	70.370	83.61%
5	spdTest1000	30	108.221	66.619	62.44%
6	spdTest100	1000	4.075	2.840	43.48%
7	tridiagonalTest100	1000	3.470	3.088	12.37%

Contributor

Baunsgaard commented Jun 30, 2024

Hi!

A good start of the PR.
I suggest you try to squash your commits. And then afterwards rebase your changes on master.

That should fix your merge conflict.

Aast12 force-pushed the naive-impl branch 3 times, most recently from 4ac147c to 177921a Compare

June 30, 2024 21:46


          Adapt EigenDecomposition implementation

61b0610

Aast12 force-pushed the naive-impl branch from 177921a to 61b0610 Compare

June 30, 2024 21:56

Author

Aast12 commented Jun 30, 2024

Hi!

A good start of the PR. I suggest you try to squash your commits. And then afterwards rebase your changes on master.

That should fix your merge conflict.

It should be fixed now

codecov-commenter commented Jul 1, 2024 •

edited by codecov bot

Loading

Codecov Report

Attention: Patch coverage is 86.03352% with 25 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@6a3ff8c). Learn more about missing BASE report.

Files	Patch %	Lines
...ache/sysds/runtime/matrix/data/LibCommonsMath.java	86.03%	12 Missing and 13 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2038   +/-   ##
=======================================
  Coverage        ?   68.80%           
  Complexity      ?    40731           
=======================================
  Files           ?     1440           
  Lines           ?   161743           
  Branches        ?    31461           
=======================================
  Hits            ?   111282           
  Misses          ?    41388           
  Partials        ?     9073

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Aast12 marked this pull request as ready for review

July 1, 2024 09:47

Baunsgaard reviewed

View reviewed changes

Contributor

Baunsgaard left a comment

The implementation looks suspiciously close to the Commons Math implementation, with minor modifications. There are some key elements that make it faster, but I think we can refine it a bit.

src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java Outdated Show resolved Hide resolved

src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java Outdated Show resolved Hide resolved

src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java Outdated Show resolved Hide resolved

src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java Outdated

+              			// double xNormSqr = householderVectors.slice(k, k, k + 1, m - 1).sumSq();
+              			// double xNormSqr = LibMatrixMult.dotProduct(hv, hv, k_kp1, k_kp1, m - (k + 1));
+              			double xNormSqr = 0;
+              			for (int j = k + 1; j < m; ++j) {

Contributor

Baunsgaard Jul 3, 2024

this looks like a square row sum,
this is an operation you can perform directly.

Author

Aast12 Jul 8, 2024

We tried these other approaches to try to get a performance boost but didn't change if not made it slower

double xNormSqr = householderVectors.slice(k, k, k + 1, m - 1).sumSq();
double xNormSqr = LibMatrixMult.dotProduct(hv, hv, k * m + k + 1, k * m + k + 1, m - (k + 1));

are you referring to other way to do it directly?

Contributor

Baunsgaard Jul 8, 2024

okay, then we do no change it.

src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java Outdated

+              			householderVectors.sparseToDense(threads);
+              		}
+              		final double[] z = new double[m];

Contributor

Baunsgaard Jul 3, 2024

This temporary allocation seems like the only thing that limits you from parallelizing the entire for loop.

src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java

+              	* @param threads The number of threads to use for computation.
+              	* @return An array of MatrixBlock objects containing the real eigen values and eigen vectors.
+              	*/
+              	public static MatrixBlock[] computeEigenDecompositionSymm(MatrixBlock in, int threads) {

Contributor

Baunsgaard Jul 3, 2024

i would like to know where the time is spend inside this function call.

is it in transformToTridiagonal, getQ or findEigenVectors?

Author

Aast12 Jul 3, 2024

findEigenVectors seems to take ~60-70% of the time.
That performEigenDecomposition is a wrapper for the do-while loop and most of its execution is taken by the matrix updates at the end of each iteration

Contributor

Baunsgaard Jul 3, 2024

great this is something that is easy to fix, simply do not use the get and set method for the dense block, but directly on the underlying linearized array assign the cells.

This should immediately reduce the execution time by at least ~50% of that part.

src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java Outdated

+              		double[] qaV = qaB.valuesAt(0);
+              		// build up first part of the matrix by applying Householder transforms
+              		for (int k = m - 1; k >= 1; --k) {

Contributor

Baunsgaard Jul 3, 2024

is there a reason why it should be done in opposite order?

Wouter-vw Jul 8, 2024

Yes there is a reason. From page 263 of "Matrix Computations" by Gene H. Golub and Charles F. Van Loan: "Recall that the leading (j − 1)-by-(j − 1) portion of Qj is the identity. Thus, at the beginning of backward accumulation, Q is “mostly the identity” and it gradually becomes full as the iteration progresses. This pattern can be exploited to reduce the number of required flops. In contrast, Q is full in forward accumulation after the first step. For this reason, backward accumulation is cheaper and the strategy of choice."

src/main/java/org/apache/sysds/runtime/matrix/data/LibCommonsMath.java Outdated Show resolved Hide resolved

Aast12 added 2 commits

July 3, 2024 17:58


          Apply formatting

ff1b77d


          Fix tests

617b7c9

Contributor

Baunsgaard commented Jul 3, 2024

Also feel free if you address any comments, to close or resolve the comments.

Aast12 added 2 commits

July 8, 2024 11:34


          Apply transposes for row major operations

36525ba


          Refactor matrix shift step

e4b6278

Contributor

Baunsgaard commented Jul 8, 2024

just FYI, some tests are flaky when run in the cloud, therefore if the tests fail, and they are obviously not related to your changes no worries.


          cleanup

2978f8c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet