-
Notifications
You must be signed in to change notification settings - Fork 77
Handle intrinsics in a more efficent manner. #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Love the idea! Funnily enough that's what I suggested to @antoyo when opened the issue about the too big |
4cb188f
to
a579de2
Compare
a579de2
to
1bfbd0d
Compare
1bfbd0d
to
82160c4
Compare
I closed the PR by accident while rebasing. This new refactored version is a fair bit simpler: I only change the intrinsic generation, and change Should be good to merge now(fingers crossed). |
I now know why the stdarch build keeps breaking... something seems to be wrong with the original script. If I just run When I run the unmodified script, it does show an error:
@antoyo have you seen this error before? What happens if you run the unmodified intrinsic generation script now? Maybe something changed in the LLVM repo? |
The Overall, it's very depressing as a lot of intrinsics are actually not listed, hence why we have the very old |
cfe61de
to
07c7e85
Compare
I retested with the
Now, it should be good to review. |
Just a thought: I don't think I ever wrote how to build |
Sorry, I misled you a little. I did not build td from scratch, I just copied it over from the LLVM built I had in my rust repo(in the I tought I was building LLVM from source anyway(I was testing something related to autodiff, which is not enabled by deafult), but I forgot that I cleared & redownloaded the repo a couple of days ago. So, my |
Seems good enough, please provide that information. :) |
Will open a PR for that soon :). |
Awesome, thanks! Gonna review this PR tomorrow. |
@FractalFir: Could you please put the regeneration of the intrinsics in a separate commit? |
I could, it is just that the changes to the generated intrinsics require changes to So, the generated intrinsics are not functional without that, and vice-versa. Should I do that? |
Yes please. |
07c7e85
to
6fbac93
Compare
Should be good for review now :D! |
Looks good to me, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nitpicks.
For the formatting stuff, I know you call rustfmt
, but I still believe it's nice to have a better formatting in the generate script to make it a bit easier to read.
Thanks a lot for improving this!
Co-authored-by: antoyo <[email protected]>
The current implementation of intrinsics is very unoptimized.
In Rust, a match on string gets compiled down to what is effectively an if ladder(maybe we should consider opening an upstream issue about this). This is crazy inefficient, both in terms of the number of basic blocks(and thus compile times), and in the number of comparisons required to match a string(example: matching the 1000 stting will require 1000 comparisons).
The sheer amount of comparisons in
src::intrinsics::llvm::intrinsics
triggers a GCC bug. While trying to recurse on the basic block, GCC overflows its stack.This PR splits that string matching into a couple of functions, dedicated to specific architectures(e.g. ARM) or extensions(e.g. AVX).
This brings both runtime improvements(less comparisons needed) and pretty significant compiletime improvements.
In the master branch, the function in question is the heaviest one in terms of generated LLVM IR, and by a wide margin.
With the patch, the problematic functions are still complex, but are a bit more managable.
On a debug build, this PR reduced build times by ~30 %.
Debug without the patch:
Debug with the patch:
In release, the difference is an over 3x reduction in build times!
Release without patch:
Release with patch:
We still have some tradeoffs to consider. Taking into account the laughably poor codegen for such a match(in both LLVM and GCC), we might just consider using a hash map. There is a very high chance it would have much better runtime & compiletime performance anyway.