Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt EigenDecomposition implementation #2038

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Aast12
Copy link

@Aast12 Aast12 commented Jun 30, 2024

Implements the Implicit QL Algorithm also used by the EigenDecomposition class using SystemDS data structures. Note this assumes the matrix is symmetric and fits in 1 block of a DenseBlock. The later is also assumed by the current computeEigen function so the only missed functionality is the support for non-symmetric matrices. The implementation focused on reducing as much allocations as possible to match and exceed the performance of the current eigen decomposition function across most cases.

As an initial benchmark, a test was run with Symmetric Positive Definite and tridiagonal (symmetric) matrices of sizes 100x100, 500x500 and 1000x1000. The following table shows the test cases and amount of iterations done per test.

Test case Iterations
spdTest100 300
spdTest500 100
spdTest1000 30
tridiagonalTest100 300
tridiagonalTest500 100
tridiagonalTest1000 30

What follows are the results of the benchmark, showing an average improvement of 48.95 % across all cases with highest improvements in the largest matrices of 1000x1000 size, with up to a 83% improvement.

# Instruction Iterations Time(s) Old Time(s) New Improvement
1 eigen 3060 631.230 423.791 48.95 %
2 tridiagonalTest500 500 201.269 143.769 40.00 %
3 spdTest500 500 185.826 138.117 34.54 %
4 tridiagonalTest1000 30 129.213 70.370 83.61%
5 spdTest1000 30 108.221 66.619 62.44%
6 spdTest100 1000 4.075 2.840 43.48%
7 tridiagonalTest100 1000 3.470 3.088 12.37%

@Baunsgaard
Copy link
Contributor

Hi!

A good start of the PR.
I suggest you try to squash your commits. And then afterwards rebase your changes on master.

That should fix your merge conflict.

@Aast12 Aast12 force-pushed the naive-impl branch 3 times, most recently from 4ac147c to 177921a Compare June 30, 2024 21:46
@Aast12
Copy link
Author

Aast12 commented Jun 30, 2024

Hi!

A good start of the PR. I suggest you try to squash your commits. And then afterwards rebase your changes on master.

That should fix your merge conflict.

It should be fixed now

@codecov-commenter
Copy link

codecov-commenter commented Jul 1, 2024

Codecov Report

Attention: Patch coverage is 86.03352% with 25 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@6a3ff8c). Learn more about missing BASE report.

Files Patch % Lines
...ache/sysds/runtime/matrix/data/LibCommonsMath.java 86.03% 12 Missing and 13 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2038   +/-   ##
=======================================
  Coverage        ?   68.80%           
  Complexity      ?    40731           
=======================================
  Files           ?     1440           
  Lines           ?   161743           
  Branches        ?    31461           
=======================================
  Hits            ?   111282           
  Misses          ?    41388           
  Partials        ?     9073           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Aast12 Aast12 marked this pull request as ready for review July 1, 2024 09:47
Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks suspiciously close to the Commons Math implementation, with minor modifications. There are some key elements that make it faster, but I think we can refine it a bit.

// double xNormSqr = householderVectors.slice(k, k, k + 1, m - 1).sumSq();
// double xNormSqr = LibMatrixMult.dotProduct(hv, hv, k_kp1, k_kp1, m - (k + 1));
double xNormSqr = 0;
for (int j = k + 1; j < m; ++j) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a square row sum,
this is an operation you can perform directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried these other approaches to try to get a performance boost but didn't change if not made it slower

double xNormSqr = householderVectors.slice(k, k, k + 1, m - 1).sumSq();
double xNormSqr = LibMatrixMult.dotProduct(hv, hv, k * m + k + 1, k * m + k + 1, m - (k + 1)); 

are you referring to other way to do it directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, then we do no change it.

householderVectors.sparseToDense(threads);
}

final double[] z = new double[m];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This temporary allocation seems like the only thing that limits you from parallelizing the entire for loop.

* @param threads The number of threads to use for computation.
* @return An array of MatrixBlock objects containing the real eigen values and eigen vectors.
*/
public static MatrixBlock[] computeEigenDecompositionSymm(MatrixBlock in, int threads) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like to know where the time is spend inside this function call.

is it in transformToTridiagonal, getQ or findEigenVectors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

findEigenVectors seems to take ~60-70% of the time.
That performEigenDecomposition is a wrapper for the do-while loop and most of its execution is taken by the matrix updates at the end of each iteration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great this is something that is easy to fix, simply do not use the get and set method for the dense block, but directly on the underlying linearized array assign the cells.

This should immediately reduce the execution time by at least ~50% of that part.

double[] qaV = qaB.valuesAt(0);

// build up first part of the matrix by applying Householder transforms
for (int k = m - 1; k >= 1; --k) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why it should be done in opposite order?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there is a reason. From page 263 of "Matrix Computations" by Gene H. Golub and Charles F. Van Loan: "Recall that the leading (j − 1)-by-(j − 1) portion of Qj is the identity. Thus, at the beginning of backward accumulation, Q is “mostly the identity” and it gradually becomes full as the iteration progresses. This pattern can be exploited to reduce the number of required flops. In contrast, Q is full in forward accumulation after the first step. For this reason, backward accumulation is cheaper and the strategy of choice."

@Baunsgaard
Copy link
Contributor

Also feel free if you address any comments, to close or resolve the comments.

@Baunsgaard
Copy link
Contributor

just FYI, some tests are flaky when run in the cloud, therefore if the tests fail, and they are obviously not related to your changes no worries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants