Skip to content

Handle intrinsics in a more efficent manner. #687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

FractalFir
Copy link
Contributor

The current implementation of intrinsics is very unoptimized.

In Rust, a match on string gets compiled down to what is effectively an if ladder(maybe we should consider opening an upstream issue about this). This is crazy inefficient, both in terms of the number of basic blocks(and thus compile times), and in the number of comparisons required to match a string(example: matching the 1000 stting will require 1000 comparisons).

The sheer amount of comparisons in src::intrinsics::llvm::intrinsics triggers a GCC bug. While trying to recurse on the basic block, GCC overflows its stack.

This PR splits that string matching into a couple of functions, dedicated to specific architectures(e.g. ARM) or extensions(e.g. AVX).

This brings both runtime improvements(less comparisons needed) and pretty significant compiletime improvements.

In the master branch, the function in question is the heaviest one in terms of generated LLVM IR, and by a wide margin.

93555 (13.3%, 13.3%)     1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::intrinsic
   10354 (1.5%, 14.8%)     62 (0.4%,  0.5%)  alloc::vec::in_place_collect::from_iter_in_place

With the patch, the problematic functions are still complex, but are a bit more managable.

   16112 (2.4%,  2.4%)      1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::map_arch_intrinsic::x86
   12404 (1.9%,  4.3%)      1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::map_arch_intrinsic::hexagon

On a debug build, this PR reduced build times by ~30 %.

Debug without the patch:

Finished `dev` profile [optimized + debuginfo] target(s) in 10.26s

Debug with the patch:

Finished `dev` profile [optimized + debuginfo] target(s) in 7.59s

In release, the difference is an over 3x reduction in build times!
Release without patch:

    Finished `release` profile [optimized] target(s) in 31.02s

Release with patch:

   Finished `release` profile [optimized] target(s) in 8.33s

We still have some tradeoffs to consider. Taking into account the laughably poor codegen for such a match(in both LLVM and GCC), we might just consider using a hash map. There is a very high chance it would have much better runtime & compiletime performance anyway.

@GuillaumeGomez
Copy link
Member

Love the idea! Funnily enough that's what I suggested to @antoyo when opened the issue about the too big match.

@FractalFir FractalFir force-pushed the better_intrinsics branch 2 times, most recently from 4cb188f to a579de2 Compare May 27, 2025 10:03
@FractalFir
Copy link
Contributor Author

FractalFir commented Jun 2, 2025

I closed the PR by accident while rebasing.

This new refactored version is a fair bit simpler: I only change the intrinsic generation, and change llvm.rs to make it call a function generated by the python script. Besides that, nothing has changed.

Should be good to merge now(fingers crossed).

@FractalFir FractalFir reopened this Jun 2, 2025
@FractalFir
Copy link
Contributor Author

I now know why the stdarch build keeps breaking... something seems to be wrong with the original script. If I just run python tools/generate_intrinsics.py on the current master, it misses a whole lot of intrinsics(about 4.5 K, the file is substantially smaller).

When I run the unmodified script, it does show an error:

llvm/IR/Intrinsics.td:664:23: error: Unknown operator
  list<int> TypeSig = !listflatten(!listconcat(
                      ^
llvm/IR/Intrinsics.td:664:23: error: Unknown or reserved token when parsing a value
  list<int> TypeSig = !listflatten(!listconcat(
                      ^
llvm/IR/Intrinsics.td:664:23: error: expected ';' after declaration
  list<int> TypeSig = !listflatten(!listconcat(

@antoyo have you seen this error before? What happens if you run the unmodified intrinsic generation script now? Maybe something changed in the LLVM repo?

@GuillaumeGomez
Copy link
Member

The td files tend to break so you need to compile the tool to have its most up-to-date version. Although sometimes it's just broken, I sent a few patches for them to work.

Overall, it's very depressing as a lot of intrinsics are actually not listed, hence why we have the very old llvmint* projects still being used, which I'm trying to get rid of.

@FractalFir FractalFir force-pushed the better_intrinsics branch from cfe61de to 07c7e85 Compare June 2, 2025 20:48
@FractalFir
Copy link
Contributor Author

FractalFir commented Jun 2, 2025

I retested with the td from Rustc's CI, and the issue disappeared. Now, we should have all the intrinsics: the only differences I observed when running the original python script were a couple of nvvm intrinsic getting added / adjusted.

<     "llvm.nvvm.barrier" => "__nvvm_bar",
<     "llvm.nvvm.barrier.n" => "__nvvm_bar_n",
<     "llvm.nvvm.barrier.sync" => "__nvvm_barrier_sync",
<     "llvm.nvvm.barrier.sync.cnt" => "__nvvm_barrier_sync_cnt",
<     "llvm.nvvm.barrier0" => "__syncthreads",
<     // [DUPLICATE]: "llvm.nvvm.barrier0" => "__nvvm_bar0",
---
>     "llvm.nvvm.barrier0" => "__nvvm_bar0",
>     // [DUPLICATE]: "llvm.nvvm.barrier0" => "__syncthreads",
4660a4657,4658
>     "llvm.nvvm.e2m1x2.to.f16x2.rn" => "__nvvm_e2m1x2_to_f16x2_rn",
>     "llvm.nvvm.e2m1x2.to.f16x2.rn.relu" => "__nvvm_e2m1x2_to_f16x2_rn_relu",
4727a4726,4727
>     "llvm.nvvm.ff.to.e2m1x2.rn.relu.satfinite" => "__nvvm_ff_to_e2m1x2_rn_relu_satfinite",
>     "llvm.nvvm.ff.to.e2m1x2.rn.satfinite" => "__nvvm_ff_to_e2m1x2_rn_satfinite",
4838a4839
>     "llvm.nvvm.isspacep.shared.cluster" => "__nvvm_isspacep_shared_cluster",

Now, it should be good to review.

@GuillaumeGomez
Copy link
Member

Just a thought: I don't think I ever wrote how to build td from llvm source. If I didn't (shame on me), can you add docs for it please?

@FractalFir
Copy link
Contributor Author

Sorry, I misled you a little.

I did not build td from scratch, I just copied it over from the LLVM built I had in my rust repo(in the build directory). That seemed like the simplest way of getting the right td version.

I tought I was building LLVM from source anyway(I was testing something related to autodiff, which is not enabled by deafult), but I forgot that I cleared & redownloaded the repo a couple of days ago.

So, my td is just the one included with the CI builds of Rustc's LLVM.

@GuillaumeGomez
Copy link
Member

Seems good enough, please provide that information. :)

@FractalFir
Copy link
Contributor Author

Will open a PR for that soon :).

@GuillaumeGomez
Copy link
Member

Awesome, thanks! Gonna review this PR tomorrow.

@antoyo
Copy link
Contributor

antoyo commented Jun 2, 2025

@FractalFir: Could you please put the regeneration of the intrinsics in a separate commit?

@FractalFir
Copy link
Contributor Author

I could, it is just that the changes to the generated intrinsics require changes to llvm.rs (to call a function I generated).

So, the generated intrinsics are not functional without that, and vice-versa.

Should I do that?

@antoyo
Copy link
Contributor

antoyo commented Jun 2, 2025

Yes please.

@FractalFir FractalFir force-pushed the better_intrinsics branch from 07c7e85 to 6fbac93 Compare June 3, 2025 08:10
@FractalFir
Copy link
Contributor Author

Should be good for review now :D!

@GuillaumeGomez
Copy link
Member

Looks good to me, thanks!

Copy link
Contributor

@antoyo antoyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpicks.
For the formatting stuff, I know you call rustfmt, but I still believe it's nice to have a better formatting in the generate script to make it a bit easier to read.
Thanks a lot for improving this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants