Handle intrinsics in a more efficent manner. #687

FractalFir · 2025-05-27T08:40:13Z

The current implementation of intrinsics is very unoptimized.

In Rust, a match on string gets compiled down to what is effectively an if ladder(maybe we should consider opening an upstream issue about this). This is crazy inefficient, both in terms of the number of basic blocks(and thus compile times), and in the number of comparisons required to match a string(example: matching the 1000 stting will require 1000 comparisons).

The sheer amount of comparisons in src::intrinsics::llvm::intrinsics triggers a GCC bug. While trying to recurse on the basic block, GCC overflows its stack.

This PR splits that string matching into a couple of functions, dedicated to specific architectures(e.g. ARM) or extensions(e.g. AVX).

This brings both runtime improvements(less comparisons needed) and pretty significant compiletime improvements.

In the master branch, the function in question is the heaviest one in terms of generated LLVM IR, and by a wide margin.

93555 (13.3%, 13.3%)     1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::intrinsic
   10354 (1.5%, 14.8%)     62 (0.4%,  0.5%)  alloc::vec::in_place_collect::from_iter_in_place

With the patch, the problematic functions are still complex, but are a bit more managable.

   16112 (2.4%,  2.4%)      1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::map_arch_intrinsic::x86
   12404 (1.9%,  4.3%)      1 (0.0%,  0.0%)  rustc_codegen_gcc::intrinsic::llvm::map_arch_intrinsic::hexagon

On a debug build, this PR reduced build times by ~30 %.

Debug without the patch:

Finished `dev` profile [optimized + debuginfo] target(s) in 10.26s

Debug with the patch:

Finished `dev` profile [optimized + debuginfo] target(s) in 7.59s

In release, the difference is an over 3x reduction in build times!
Release without patch:

    Finished `release` profile [optimized] target(s) in 31.02s

Release with patch:

   Finished `release` profile [optimized] target(s) in 8.33s

We still have some tradeoffs to consider. Taking into account the laughably poor codegen for such a match(in both LLVM and GCC), we might just consider using a hash map. There is a very high chance it would have much better runtime & compiletime performance anyway.

GuillaumeGomez · 2025-05-27T09:13:02Z

Love the idea! Funnily enough that's what I suggested to @antoyo when opened the issue about the too big match.

tools/generate_intrinsics.py

FractalFir · 2025-06-02T13:31:36Z

I closed the PR by accident while rebasing.

This new refactored version is a fair bit simpler: I only change the intrinsic generation, and change llvm.rs to make it call a function generated by the python script. Besides that, nothing has changed.

Should be good to merge now(fingers crossed).

FractalFir · 2025-06-02T19:11:44Z

I now know why the stdarch build keeps breaking... something seems to be wrong with the original script. If I just run python tools/generate_intrinsics.py on the current master, it misses a whole lot of intrinsics(about 4.5 K, the file is substantially smaller).

When I run the unmodified script, it does show an error:

llvm/IR/Intrinsics.td:664:23: error: Unknown operator
  list<int> TypeSig = !listflatten(!listconcat(
                      ^
llvm/IR/Intrinsics.td:664:23: error: Unknown or reserved token when parsing a value
  list<int> TypeSig = !listflatten(!listconcat(
                      ^
llvm/IR/Intrinsics.td:664:23: error: expected ';' after declaration
  list<int> TypeSig = !listflatten(!listconcat(

@antoyo have you seen this error before? What happens if you run the unmodified intrinsic generation script now? Maybe something changed in the LLVM repo?

GuillaumeGomez · 2025-06-02T20:31:32Z

The td files tend to break so you need to compile the tool to have its most up-to-date version. Although sometimes it's just broken, I sent a few patches for them to work.

Overall, it's very depressing as a lot of intrinsics are actually not listed, hence why we have the very old llvmint* projects still being used, which I'm trying to get rid of.

FractalFir · 2025-06-02T20:51:45Z

I retested with the td from Rustc's CI, and the issue disappeared. Now, we should have all the intrinsics: the only differences I observed when running the original python script were a couple of nvvm intrinsic getting added / adjusted.

<     "llvm.nvvm.barrier" => "__nvvm_bar",
<     "llvm.nvvm.barrier.n" => "__nvvm_bar_n",
<     "llvm.nvvm.barrier.sync" => "__nvvm_barrier_sync",
<     "llvm.nvvm.barrier.sync.cnt" => "__nvvm_barrier_sync_cnt",
<     "llvm.nvvm.barrier0" => "__syncthreads",
<     // [DUPLICATE]: "llvm.nvvm.barrier0" => "__nvvm_bar0",
---
>     "llvm.nvvm.barrier0" => "__nvvm_bar0",
>     // [DUPLICATE]: "llvm.nvvm.barrier0" => "__syncthreads",
4660a4657,4658
>     "llvm.nvvm.e2m1x2.to.f16x2.rn" => "__nvvm_e2m1x2_to_f16x2_rn",
>     "llvm.nvvm.e2m1x2.to.f16x2.rn.relu" => "__nvvm_e2m1x2_to_f16x2_rn_relu",
4727a4726,4727
>     "llvm.nvvm.ff.to.e2m1x2.rn.relu.satfinite" => "__nvvm_ff_to_e2m1x2_rn_relu_satfinite",
>     "llvm.nvvm.ff.to.e2m1x2.rn.satfinite" => "__nvvm_ff_to_e2m1x2_rn_satfinite",
4838a4839
>     "llvm.nvvm.isspacep.shared.cluster" => "__nvvm_isspacep_shared_cluster",

Now, it should be good to review.

GuillaumeGomez · 2025-06-02T20:53:33Z

Just a thought: I don't think I ever wrote how to build td from llvm source. If I didn't (shame on me), can you add docs for it please?

FractalFir · 2025-06-02T21:04:38Z

Sorry, I misled you a little.

I did not build td from scratch, I just copied it over from the LLVM built I had in my rust repo(in the build directory). That seemed like the simplest way of getting the right td version.

I tought I was building LLVM from source anyway(I was testing something related to autodiff, which is not enabled by deafult), but I forgot that I cleared & redownloaded the repo a couple of days ago.

So, my td is just the one included with the CI builds of Rustc's LLVM.

GuillaumeGomez · 2025-06-02T21:09:44Z

Seems good enough, please provide that information. :)

FractalFir · 2025-06-02T21:11:39Z

Will open a PR for that soon :).

GuillaumeGomez · 2025-06-02T21:13:39Z

Awesome, thanks! Gonna review this PR tomorrow.

antoyo · 2025-06-02T22:23:26Z

@FractalFir: Could you please put the regeneration of the intrinsics in a separate commit?

FractalFir · 2025-06-02T22:25:54Z

I could, it is just that the changes to the generated intrinsics require changes to llvm.rs (to call a function I generated).

So, the generated intrinsics are not functional without that, and vice-versa.

Should I do that?

antoyo · 2025-06-02T23:02:48Z

Yes please.

FractalFir · 2025-06-03T13:06:14Z

Should be good for review now :D!

GuillaumeGomez · 2025-06-03T14:06:57Z

Looks good to me, thanks!

antoyo

A few nitpicks.
For the formatting stuff, I know you call rustfmt, but I still believe it's nice to have a better formatting in the generate script to make it a bit easier to read.
Thanks a lot for improving this!

tools/generate_intrinsics.py

src/intrinsic/llvm.rs

tools/generate_intrinsics.py

Co-authored-by: antoyo <[email protected]>

GuillaumeGomez approved these changes May 27, 2025

View reviewed changes

FractalFir force-pushed the better_intrinsics branch 2 times, most recently from 4cb188f to a579de2 Compare May 27, 2025 10:03

antoyo reviewed May 27, 2025

View reviewed changes

tools/generate_intrinsics.py Outdated Show resolved Hide resolved

FractalFir force-pushed the better_intrinsics branch from a579de2 to 1bfbd0d Compare June 2, 2025 13:25

FractalFir closed this Jun 2, 2025

FractalFir force-pushed the better_intrinsics branch from 1bfbd0d to 82160c4 Compare June 2, 2025 13:26

FractalFir reopened this Jun 2, 2025

FractalFir force-pushed the better_intrinsics branch from cfe61de to 07c7e85 Compare June 2, 2025 20:48

FractalFir added 2 commits June 3, 2025 10:10

Changed intrinsic generation

659d996

Regenerated intrinsics

6fbac93

FractalFir force-pushed the better_intrinsics branch from 07c7e85 to 6fbac93 Compare June 3, 2025 08:10

GuillaumeGomez approved these changes Jun 3, 2025

View reviewed changes

antoyo reviewed Jun 3, 2025

View reviewed changes

Apply suggestions from code review

e4c9584

Co-authored-by: antoyo <[email protected]>

Handle intrinsics in a more efficent manner. #687

Are you sure you want to change the base?

Handle intrinsics in a more efficent manner. #687

Conversation

FractalFir commented May 27, 2025

Uh oh!

GuillaumeGomez commented May 27, 2025

Uh oh!

Uh oh!

FractalFir commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FractalFir commented Jun 2, 2025

Uh oh!

GuillaumeGomez commented Jun 2, 2025

Uh oh!

FractalFir commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GuillaumeGomez commented Jun 2, 2025

Uh oh!

FractalFir commented Jun 2, 2025

Uh oh!

GuillaumeGomez commented Jun 2, 2025

Uh oh!

FractalFir commented Jun 2, 2025

Uh oh!

GuillaumeGomez commented Jun 2, 2025

Uh oh!

antoyo commented Jun 2, 2025

Uh oh!

FractalFir commented Jun 2, 2025

Uh oh!

antoyo commented Jun 2, 2025

Uh oh!

FractalFir commented Jun 3, 2025

Uh oh!

GuillaumeGomez commented Jun 3, 2025

Uh oh!

antoyo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FractalFir commented Jun 2, 2025 •

edited

Loading

FractalFir commented Jun 2, 2025 •

edited

Loading