Lesson 7: LLVM #455

sampsyo · 2025-01-21T20:31:37Z

sampsyo
Jan 21, 2025
Maintainer

Share your experiences getting started with writing an LLVM pass here!

scober · 2025-03-12T20:52:55Z

scober
Mar 12, 2025

My Pass
I spent most of my time messing around with adding memory allocations to programs. Not useful memory allocations, just calls to malloc that slowly leak memory. My "final product" LLVM pass iterates over all instructions and randomly adds a call to allocate 1 byte before ~10% of integer add instructions. I also toyed around with trying to add an allocation at the entry of every function but I gave up on that because my efforts resulted in me crashing clang.

"Testing"
I compiled the 1997 Linux port of Id Software's 1993 DOOM using my LLVM pass to create dOOM. dOOM plays just like the DOOM you know and love except that it slowly leaks memory (something within an order of magnitude or two of 1 GB per minute depending a little on the random seed of the compiler passes) until it gets OOM-killed. I like to think that this adds a unique challenge -- in order to beat the game you either have to get faster or buy more RAM. I used htop to confirm that the process was indeed mapping more and more memory as it ran. I also played the game all the way until it got OOM-killed. Interestingly, it ran smooth-as-butter right up until it got killed (most of the rest of my system froze up ~30 seconds before that).

Thoughts
Obviously, a compiler pass that adds a slow memory leak to your program is not particularly useful or ambitious in scope. But what it lacks in utility and ambition it definitely makes up for in being-a-silly-joke-ness and me-having-fun-making-it-ness and bad-pun-software-naming-ness. So I am very happy that I set this goal for myself. I also learned a little bit about LLVM. In particular, I learned that BasicBlock::getFirstInsertionPt() does not work at all the way I expected (because calling that function seems to have been what crashed clang).

1 reply

sampsyo Mar 18, 2025
Maintainer Author

How truly delightful!! I love this aspect:

in order to beat the game you either have to get faster or buy more RAM.

dOOM is amazing art project.

UnsignedByte · 2025-03-13T01:31:58Z

UnsignedByte
Mar 13, 2025

code

Pass

This week was a bit simpler because of exams, but I wrote a quick pass that counts the number of instructions in a basic block and calls a runtime function with the index of the block as well as its instruction count to be displayed. It inserts the function call at the top of each basic block with the corresponding arguments, and I verified that the function calls are exactly what we would expect in the modified LLVM code dump. I also executed these manually for testing.

Conclusions

I spent a bunch of time trying to figure out how we could best generate temporary files in turnt - if we want to do a "compile-and-run" type scenario where we compile the binary and run it, and eventually had to settle on doing a -o {base}.tmp && ./{base}.tmp && rm {base}.tmp behavior. Not sure if turnt has a better method to do this? Otherwise, LLVM is pretty much just as I remember it, so nothing too surprising this week.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

That will do just fine! Nice work.

That approach to using Turnt with intermediate files is exactly what I have done in the past. Or just wrap the relevant shell commands in a script:
https://github.com/sampsyo/bril/blob/main/brilift/run.sh

Anyway, it's not exactly fun, but it does work.

bryantpark04 · 2025-03-13T03:04:42Z

bryantpark04
Mar 13, 2025

(in collaboration with @lisarli and @dhan0779)

Code can be found here.

We wrote our LLVM pass to target a specific program: a serial implementation of a toy particle simulation where the particles repel each other. In compiling this program to LLVM IR with the -O3 flag we saw this instruction sequence:

  %6 = fmul <2 x double> %5, %5
  %7 = extractelement <2 x double> %6, i64 1
  %8 = extractelement <2 x double> %5, i64 0
  %9 = tail call double @llvm.fmuladd.f64(double %8, double %8, double %7)

which corresponds to this C++ code:

    double dx = neighbor->x - particle->x;
    double dy = neighbor->y - particle->y;
    double r2 = dx * dx + dy * dy;

and compiles down to this ARM assembly:

	fsub.2d	v0, v0, v1
	fmul.2d	v1, v0, v0
	mov	d1, v1[1]
	fmadd	d1, d0, d0, d1

We thought this was a bit odd since we're essentially redoing a multiplication operation. In theory, it seems like this should be equivalent to this instruction sequence, which doesn't redo the multiplication on the first element of the %5 vector:

  %6 = fmul <2 x double> %5, %5
  %7 = extractelement <2 x double> %6, i64 1
  %8 = extractelement <2 x double> %6, i64 0
  %9 = fadd %7, %8

Although we figured there was probably a good reason for this, such as the first version being faster somehow, and there was a chance that the two versions are not completely equivalent due to some floating-point error, we wanted to try both versions and see if our version would run faster (or generate different assembly) than the version output by clang++. So we wrote a pass to transform the first pattern into the second, which can be found here.

For our pass, we found the pattern matching constructs a bit clunky to work with, so we just used dyn_cast in a more imperative style to detect the target pattern. In our program, we found 7 instances of this pattern in the LLVM IR.

After running clang++ with our additional custom pass, our debug output is as shown:

[100%] Built target SkeletonPass
Running a skeleton pass...
Running a skeleton pass...
Running a skeleton pass...
Running a skeleton pass...
I found the pattern!  %9 = tail call double @llvm.fmuladd.f64(double %8, double %8, double %7)
I found the pattern!  %38 = tail call double @llvm.fmuladd.f64(double %37, double %37, double %36)
I found the pattern!  %201 = tail call double @llvm.fmuladd.f64(double %200, double %200, double %199)
I found the pattern!  %261 = tail call double @llvm.fmuladd.f64(double %260, double %260, double %259)
I found the pattern!  %321 = tail call double @llvm.fmuladd.f64(double %320, double %320, double %319)
I found the pattern!  %380 = tail call double @llvm.fmuladd.f64(double %379, double %379, double %378)
I found the pattern!  %434 = tail call double @llvm.fmuladd.f64(double %433, double %433, double %432)

the generated LLVM IR is as shown:

  %6 = fmul <2 x double> %5, %5
  %7 = extractelement <2 x double> %6, i64 1
  %8 = extractelement <2 x double> %6, i64 0
  %9 = fadd double %7, %8

and the final ARM assembly output is as shown:

	fsub.2d	v0, v0, v1
	fmul.2d	v1, v0, v0
	faddp.2d	d1, v1

Notably, the ARM assembly uses a faddp.2d horizontal addition instruction instead of a scalar addition.

We ran the simulation a few times at different particle sizes with the same starting seed.

bryant@dhcp-vl2041-42023 llvm-pass-skeleton % N=1000; ./particles-original -s 0 -n $N && ./particles-altered -s 0 -n $N
Simulation Time = 0.0646168 seconds for 1000 particles.
Simulation Time = 0.0598957 seconds for 1000 particles.
bryant@dhcp-vl2041-42023 llvm-pass-skeleton % N=10000; ./particles-original -s 0 -n $N && ./particles-altered -s 0 -n $N
Simulation Time = 0.613725 seconds for 10000 particles.
Simulation Time = 0.598141 seconds for 10000 particles.
bryant@dhcp-vl2041-42023 llvm-pass-skeleton % N=100000; ./particles-original -s 0 -n $N && ./particles-altered -s 0 -n $N
Simulation Time = 8.69512 seconds for 100000 particles.
Simulation Time = 8.42077 seconds for 100000 particles.
bryant@dhcp-vl2041-42023 llvm-pass-skeleton % N=1000000; ./particles-original -s 0 -n $N && ./particles-altered -s 0 -n $N
Simulation Time = 172.284 seconds for 1000000 particles.
Simulation Time = 175.203 seconds for 1000000 particles.

Unfortunately, the results seem to be within the margin of error. I guess we did not end up finding a new LLVM optimization. We ran the simulation a few more times for those particle counts (N) and it seemed to be the case that for smaller particle counts our altered version seemed to be slightly faster while for larger particle counts the original version seemed to be slightly faster, this difference could be chalked up to any number of confounding factors.

We believe our work deserves a Michelin star or two because of the effort we made in inspecting the LLVM IR generated from various programs, finding generated instructions in the O3 optimized code that seemed like they might be inefficient, writing a transformation pass that makes a hypothesized optimization, and evaluating our hypothesized optimization.

Note: Originally, we tried to optimize the LLVM IR snippet using a vector reduction intrinsic to sum up the elements of the fmul <2 x double> vector. However, we kept running into segmentation faults when trying to implement the optimization pass, so we eventually gave up and did the vector element extraction + fadd double transformation. It turns out that our implemented optimization generates an faddp.2d anyway, so it ended up working out!

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Wow! Amazing work here—on finding a suspicious-looking codegen case, digging into the vector-typed operations to figure out a maybe-better code sequence, putting it all together, and measuring the performance impact. Too bad it didn't make a measurable difference in the context of the full application, but it was worth a shot! If you want, you could consider trying to measure the throughput of the two versions in isolation to see if there is a difference you can detect—which would involve putting that code alone into a loop to run it as fast as possible. (Or you could use a performance model like llvm-mca.) Anyway, super cool!

devv64 · 2025-03-13T17:53:27Z

devv64
Mar 13, 2025

code

My pass was pretty simple, I randomly changed Add, Sub, Mul and SDiv operations. I grabbed some C programs from the internet that seemed like they would be easy to test. Honestly the more I tried testing, the more I regretted creating this pass - it was very annoying to test. I had to manually change some conditional statements from the code to avoid seg faults. It is obvious that this would not be very testable especially when scaling to more complex programs, but I didn't realize how quickly programs would get too complex to test. I stopped in complexity at factorial. Testing was just manually looking at the prints and outputs.

Overall this was silly but a good way to get somewhat familiar with LLVM, especially the setup and basic instruction parsing/ modification/ creation. I believe this work receives a Michelin star, it wasn't extremely ambitious but it meets the requirements.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

That makes sense! Yeah, seems inevitable that it would get awkward to randomly break programs in unpredictable ways. :)

gabizon103 · 2025-03-13T19:55:57Z

gabizon103
Mar 13, 2025

code

The hardest part of this assignment was setting up LLVM and having it use my pass and figuring out the CMake stuff. The posted tutorials were helpful but I had to do some extra stuff I don't exactly understand to get it working.

For this task I implemented a function inlining pass. I thought this would be interesting because there are different ways to configure this pass, like with how many passes of inlining we should do, as well as the instruction threshold for choosing when to inline. Starting out, I thought this was going to require a fair bit of thought; stitching code together would require generating new temps for the function arguments, making the right assignments, and replacing all of the callee's return statements with jumps to the instruction after the caller's call instruction. I sketched this out on paper, then noticed that LLVM has a function that will do it for me. So my code looks like this:

if (conditions_to_inline) {
  llvm::InlineFunction(*caller);
}

I might go back and implement the inlining myself, but I expect it'll take a while for me to figure out. With my spare time I benchmarked the pass a bit. I grabbed some benchmarks from here and ran them for different configurations of inlining iterations and inlining thresholds. I got some graphs that look like these:

which show the general trend of execution time decreasing when you are more liberal with inlining, up to a certain point. I think my intuition says that jumping to a different part of the instruction memory for a small function (say, only a few instructions) is not worth the instruction cache miss you incur on jumping in and out. Inlining too much can increase execution time, though, and I'm not exactly sure why. There is probably some reason why having an overly-large executable can reduce performance.

In general, though, inlining seems like a pretty decent optimization averaged across the 4 benchmarks I tried:

N = 1, lim = 64:   1.0410909913434376x speedup
N = 1, lim = 1024: 1.0884951945401444x speedup
N = 1, lim = 4096: 1.0745402057939641x speedup
N = 3, lim = 64:   1.0913937547343304x speedup
N = 3, lim = 1024: 1.1127441354548002x speedup
N = 3, lim = 4096: 1.0859600616796121x speedup
N = 5, lim = 64:   1.1055808145080932x speedup
N = 5, lim = 1024: 1.1551050702233479x speedup
N = 5, lim = 4096: 1.119054923482722x speedup

More intense benchmarking with greater care towards curating a "representative" set of benchmarks would probably reveal more precise trends that would inform how a "real compiler" would want to do inlining, but I thought it was cool to see some preliminary results.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Very nice! Super cool that you were able to reproduce the effects of an inlining heuristic here. It would be interesting to see how this compares to LLVM's built-in heuristic for deciding when to inline.

Maybe this goes without saying, but the usual first reason that people use to explain why inlining shouldn't be too aggressive is I-cache hit rates (i.e., inlining bigger functions -> more duplication -> bigger code -> more misses in the I-cache).

zihan0822 · 2025-03-14T00:14:39Z

zihan0822
Mar 14, 2025

source

Pass

I was not too ambitious this week. I implemented a heap instrument pass, which replaces original malloc/free invoke with a customized version. The custom heap interface is defined in Rust as a thin wrapper over system alloc (actually pretty boring, it calls libc anyway, just want to play around with the Rust ffi a bit). It takes an extra function name argument to keep track of heap statistics per function. An extra dump function will be inserted right before the main function exits to print out the profiling result, including the total bytes allocated and the number of malloc/free calls made per function. The rust crate is compiled into a cdylib and linked with object file compiled from c. I have not done too many serious tests for this. I just ran it against a couple of c programs and manually check the result. In LLVM, I mainly played around with the IRBuilder for call insertion and global variables to store the name for each function.

Conclusion

I think the hardest part of this lesson might be reading the llvm doc and finding the right api to use. I sometimes use AI tool to help with that. This lesson did help me gain some familiarity with LLVM.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Awesome! This actually sounds pretty cool—even just writing the run-time library in Rust and getting the whole thing to work is pretty neat. Sounds like fun!

mb64 · 2025-03-14T00:35:20Z

mb64
Mar 14, 2025

code

The task. For this task, I implemented a pass which adds, to each loop header, a statement to print out "Here we go again...". This helps to instill in the user a sense of déjà vu at each new iteration, and an increased frustration that the running code seems to take one step back for every two steps forwards: "It already ran those instructions, doesn't it have something new to do?"

The implementation. While a simple and unambitious task, there were a few tricky parts:

Getting it to run in the first place. I wrote it as a loop pass, rather than a module pass, and registered it with a module-to-function-pass adapter and a function-to-loop-pass adapter. With a basic setup, it simply wasn't doing anything! My current best guess is that the analysis that finds loops wasn't run yet, so there weren't any identified loops to iterate over. I worked around this by adding -O to my clang flags, which caused it to be run, since -O enables other loop pass infrastructure not enabled in -O0.
I also ran into what I think is the same issue as Simon (@scober) with bb->getFirstInsertionPt(), and resolved it by setting my insertion point to bb->getFirstNonPHI().

The evaluation. I ran it on some simple example programs, and to find a real-world program where the output would nevertheless be interpretable, I decided to run it on an old version of GNU yes. (I removed some GNU coreutils specific parts to make it easy to compile.) Here's an example:

% ./yes | head -n15
Here we go again...
Here we go again...
y
Here we go again...
Here we go again...
Here we go again...
y
Here we go again...
Here we go again...
Here we go again...
y
Here we go again...
Here we go again...
Here we go again...
y

As you can see, there are many loops even in this simple program.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Incredible. This pass seems broadly useful—maybe you should try to get it upstreamed into the central LLVM repository. It would really help convey the sense of fatigue that comes with running loops.

ngernest · 2025-03-14T02:15:31Z

ngernest
Mar 14, 2025

Group (@ngernest, @katherinewu312, @samuelbreckenridge)
(Code)

We implemented a pass which replaces every division instruction with a select
instruction (the LLVM equivalent of a ternary statement in C) that checks if we're dividing by 0
and if so returns 0, otherwise returning the actual quotient that is computed.

In other words, our pass turns instructions like:

%11 = fdiv float 5.0, %10         ; Note that %10 may store the value 0

into:

%11 = fcmp oeq float %10, 0.0
%12 = fdiv float 5.0, %10
%13 = select i1 %11, float 0.0, float %12     ; equivalent to the C ternary stmt `%13 = (%11 == 0) ? 0 : %12;`

We support both unsigned/signed integer division (udiv, sdiv) and floating-point division (fdiv).

To aid debugging, every time our pass replaces a division instruction with
a select instruction, we print to stdout the original & new instructions, along with
info about how the uses of the division have been updated:

Found floating-point division :
  %25 = fdiv float %21, %24
Created new instructions:
  %25 = fcmp oeq float %24, 0.0
  %26 = fdiv float %21, %24
  %27 = select i1 %25, float 0.0, float %26
Original use:
  %29 = fadd float %15, %28
Updated use:
  %29 = fadd float %15, %27

Implementing this pass was relatively straightforward, although we realized we
had to handle integer & floating-point division separately, since LLVM has separate comparison
instructions for ints and FPs (icmp and fcmp). Initially, we naively tried to handle both int & FP cases at the same time, but we quickly realized this was not possible since LLVM instructions are not "polymorphic", i.e. there are different comparison/division instructions for ints/FPs, and LLVM stipulates that arguments' types have to be explicitly stated. When figuring out operands' types, we found the isIntegerTy and isFloatingPointTy functions in the llvm::Type class to be extremely helpful! The plethora of possible int types in C (long, short, unsigned ..., signed ...) are all instances of LLVM's general IntegerType (similarly for FP types), so helper functions like isIntegerTy remove the need for us to explicitly handle different numeric types in C.

Simple example:
Consider this C program which contains a division-by-zero statement:

int main() {
    float zero = 0;
    float result = 5 / zero;
    int use_site = (int) result + 2;
    return use_site;	
}

If we use clang to compile this program without our pass and inspect the emitted LLVM with -O optimizations enabled, we get the following:

define i32 @main() {
  ...
  ret i32 poison
}

After performing various local optimizations, the compiler determines that this function returns a poison value due to the presence of the division-by-zero, which LLVM considers undefined behavior.

However, if we use clang along with our pass to compile the same program (with the same -O optimizations), we get:

define i32 @main() {
  ...
  ret i32 2
}

The resultant LLVM code returns a concrete non-zero value, which is expected! This is because our
pass has determined that the denominator in the division instruction is 0 and replaces
the entire result of the division with 0, and 0 + 2 = 2, so main returns 2.

More complicated examples:
To demonstrate that our pass works on larger programs, we have C implementations of Taylor series, a probabilistic approximation of pi, and an n-body simulation (the latter two of which are taken from the official LLVM test suite).
These latter two files contain multiple division instructions (as opposed to the first simple C file above which only has one division), which helped us realize that we had two bugs in our initial implementation, one where our pass prematurely finishes after updating the first division instructions it sees, and one where we forgot to ignore non-division instructions in our pass. After fixing these bugs, we manually checked that the executables produced by clang return the same result with and without our pass!

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Excellent! As everyone knows, n/0 == 0, so this pass is just correcting a mathematical mistake in LLVM's semantics. Nice work!

neel-patel-1 · 2025-03-14T03:10:55Z

neel-patel-1
Mar 14, 2025

LLVM Pass

I am detailing the implementation of a basic version of compiler-enforced cooperation.

To minimize the occurrence of head-of-line blocking for user-facing applications, recent works have shifted the task of context-switching between threads out of the operating system and into a userspace scheduler. Userspace schedulers with better knowledge of an application can make better scheduling decisions and make faster context-switches (e.g., 30 cycles) using userspace thread libraries (e.g., Boost). This practice has been called compiler-enforced cooperation, as it relies on compiler instrumentation to enable the fast context switches. There are currently three implementations of compiler-enforced cooperation in the literature: (1) In the first, a sending thread sets a signal (writes to a shared memory location) which is periodically polled by a recipient thread at compiler-inserted probe points, directing the recipient to yield the core, (2) In the second, the polling thread checks the CPU's Time-stamp counter at each probe point (e.g., using the rdtsc) instruction to check whether it has exceeded a predefined time quanta (e.g., 5 us).

In this basic implementation, probe points are inserted every 5 LLVM IR instructions. The rdtsc instruction is called at every probe point and a histogram of inter-probe point times is printed at program completion -- the goal was to see what the distribution of times between probe point invocations looked like. Similar approaches could be useful for ensuring adequate probe point density for real compiler-enforced cooperation passes or compiler-enabled code profiling.

Generative AI Disclaimer:
An LLM (chatgpt's 4o) did most of the work.
The full conversation can be found here
I acted as an intermediary, copy-pasting the input files to clang, telling the LLM its mistakes, allowing it to regenerate, and repeating.
The LLM was helpful in that it: (1) implemented a pass that I would not have been able to myself in the time I had, (2) taught me about useful features in LLVM (e.g., adding a function to be called at exit via the global destructors).
This is the first time I used ChatGPT for a coding assignment -- I had previously used github copilot for work, and chatgpt for asking questions.

I had some questions related to ChatGPT after using:
Has anyone tried o3-mini-high? Is its' "advanced reasoning" helpful for coding?
How much energy did the GPUs I used for these inferences consume?
If the 400+million ChatGPT users doubled and all of them made one CS assignment's worth of queries a night, what would the embodied carbon footprint of such usage patterns look like?

ChatGPT said that o3-mini-high is about as good as an advanced LeetCoder. Also said we can expect 353,000 metric tons of CO2 per year (~83,500 gasoline-powered cars driving 11,500 miles each year on 22MPG) if 800M people each made one LLM query per night. ChatGPT said they aim to hit 1 billion users by the end of this year, and I think most users are making more than one query (it took 7 for this assignment). Also, 4o just scaled the 0.0003 kWh for a single inference by 800,000,000 queries x 365 days. Servers consume energy while idle, energy is used for other infrastructure (e.g., cooling), and this doesn't factor in the embodied carbon of all the GPUs, so probably not such an accurate estimate.

2 replies

sampsyo Mar 18, 2025
Maintainer Author

Well, pretty cool that ChatGPT could successfully navigate the LLVM API. For some reason, however, your link to the ChatGPT log isn't working for me (error message: "Unable to load conversation 67d331fd-c798-8011-a68f-628370e0a782").

Did you get to try your pass out on any nontrivial applications?

neel-patel-1 Mar 18, 2025

No, just the something in the repo. There is a better implementation of the rdtsc pass here. It does loop unrolling and some other optimizations to increase density of the probe points. I want to modify with the histogram printout to see what the actual probe point density is. Also curious what execution time looks like when context switching between different applications at timescales between 1-5us.

ethanuppal · 2025-03-14T03:45:55Z

ethanuppal
Mar 14, 2025

Automatic memoization of pure functions

Code: https://github.com/ethanuppal/cs6120/tree/main/lesson7
CI Passing: https://github.com/ethanuppal/cs6120/actions/runs/13854385160/job/38767889509

Note

There are technically two "versions" of this pass, auto-memoize and auto-memoize:verbose, the latter logging messages to standard error prefixed with [auto-memoize].

I'm super proud of the pass I made. I spent over 13 hours on it. I used Rust to implement an LLVM pass that conservatively identifies pure functions with up to three integer parameters and automatically memoizes them to a configurable degree. By default, it only memoizes a parameter $x$ if $x \in [0, 64)$, but by using __builtin_assume in your source C code you can change that.

For example, consider the following function:

int add1(int a) {
    return a + 1;
}

It is "pure" --- it has no side effects. I used conservative heuristics on the instructions to to determine whether a function was pure. There are some low-hanging fruits for improvement here. For example, I didn't do an initial pass to identify potential pure functions and then resolve interprocedural calls, so if a pure function calls another pure function it will not be identified as pure.

I decided to only focus on pure functions with 3 or fewer i32 parameters. The default $64$ indices of caching on each parameter is already a few kilobytes, and I align all memoization tables to page boundaries.

Then, I inject basic blocks into each function and hooks at each return. At the start, I check if the memoization tables even apply to the parameters, and if they do, whether we have already cached something. At each return, I inefficiently replace it with a branch on whether the parameters are in bounds of the memoization tables (reusing the same boolean variable from the header block --- maybe I should have just recomputed it as to not hamper register allocation) and if so whether it can use the cache.

View the full memoized source code for the add1 function

define i32 @add1(i32 noundef %0) #0 { memo_header: %flattened_index = add i32 0, %0 %ready_pointer = getelementptr [64 x i1], ptr @add1.memo_ready_array, i32 0, i32 %flattened_index %value_pointer = getelementptr [64 x i32], ptr @add1.memo_value_array, i32 0, i32 %flattened_index %return_value_indirect = alloca i32, align 4 %1 = icmp sge i32 %0, 0 %2 = icmp slt i32 %0, 64 %3 = and i1 %1, %2 %parameters_in_bounds = and i1 true, %3 br i1 %parameters_in_bounds, label %memo_check_if_ready, label %4 memo_fast_path: ; preds = %memo_check_if_ready %memo_value = load i32, ptr %value_pointer, align 4 ret i32 %memo_value memo_check_if_ready: ; preds = %memo_header %is_ready = load i1, ptr %ready_pointer, align 1 %can_memoize = and i1 %parameters_in_bounds, %is_ready br i1 %can_memoize, label %memo_fast_path, label %4 memo_always_return: ; preds = %4 %loaded_return_value1 = load i32, ptr %return_value_indirect, align 4 ret i32 %loaded_return_value1 4: ; preds = %memo_check_if_ready, %memo_header %5 = alloca i32, align 4 store i32 %0, ptr %5, align 4 %6 = load i32, ptr %5, align 4 %7 = add nsw i32 %6, 1 store i32 %7, ptr %return_value_indirect, align 4 br i1 %parameters_in_bounds, label %memo_cache_and_return, label %memo_always_return

memo_cache_and_return: ; preds = %4 %loaded_return_value = load i32, ptr %return_value_indirect, align 4 store i32 %loaded_return_value, ptr %value_pointer, align 4 store i1 true, ptr %ready_pointer, align 1 ret i32 %loaded_return_value }

Not surprisingly, this increment function is a terrible use case for memoization, and according to the benchmarking tool hyperfine, the memoized version runs about 0.68x slower.

A more practical example

A naive recursive Fibonacci implementation is shown below:

#define N 30

int fib(int a) {
    __builtin_assume(a >= 0);
    __builtin_assume(a < N);
    if (a == 0) {
        return 0;
    } else if (a == 1) {
        return 1;
    } else {
        return fib(a - 1) + fib(a - 2);
    }
}

When you run my pass on it, you get this output (ignore the gettimeofday; I know you're not supposed to use it for benchmarking):

$ clang -S -emit-llvm test_fib.c -o out.ll
$ opt --load-pass-plugin=../target/debug/libllvm_pass.dylib --passes=auto-memoize:verbose out.ll -f | llvm-dis > out.new.ll
[auto-memoize] Visiting function "fib"
[auto-memoize] Function "fib" is pure
[auto-memoize] Memoizing "fib"
  [auto-memoize] Derived potentially useful inequality ("  %4 = load i32, ptr %3, align 4") >= 0
  [auto-memoize] Confirmed parameter bound ("i32 %0") >= 0
  [auto-memoize] Derived potentially useful inequality ("  %6 = load i32, ptr %3, align 4") < 30
  [auto-memoize] Confirmed parameter bound ("i32 %0") < 30
[auto-memoize] Visiting function "llvm.assume"
[auto-memoize] Visiting function "main"
[auto-memoize] Visiting function "malloc"
[auto-memoize] Visiting function "fprintf"
[auto-memoize] Visiting function "exit"
[auto-memoize] Visiting function "gettimeofday"
[auto-memoize] Visiting function "perror"
[auto-memoize] Visiting function "free"

Once again according to the statistics produced by hyperfine's benchmarking, it is 300-400x faster when memoized using my pass. 🎉

These benchmarks are obviously cherry-picked, but I didn't think I'd have much luck finding anything that my conservative pure function "analysis" could even detect.

I also wrote a follow-along blog to the Professor's blog post on LLVM for Rust and produced a skeleton repo (with the mutate branch too) for it: https://github.com/ethanuppal/llvm-pass-skeleton-rs

I believe I should get $\geq$ one Michelin star for implementing and testing the assigned tasks as well as publishing info & a starter repo on how to do this assignment in Rust.

4 replies

sampsyo Mar 18, 2025
Maintainer Author

Wow, super snazzy!! This seems like a really fun outcome, and like it would expose to you to plenty of the typical LLVM IR API surface area. I'm glad you didn't run into much trouble with using LLVM through the C API (via Rust). Awesome work!

For the purity analysis bit, I wonder if it would be possible to piggy-back on LLVM's existing heuristics for this—which presumably attach function attributes like memory(read) or similar? Anyway, it could require getting creative about where to put the pass in the pipeline to make this work.

ethanuppal Mar 18, 2025

Yes, I decided I wanted to try to make this self-contained instead of using memory(read) etc, although I was aware of them.

ethanuppal Mar 18, 2025

Do you think it would be worth mentioning the Rust details in the Addenda of your LLVM post?

sampsyo Mar 18, 2025
Maintainer Author

Sure!

mt-xing · 2025-03-14T03:53:52Z

mt-xing
Mar 14, 2025

My Code is Here: https://github.com/mt-xing/cs6120/tree/main/llvm-pass-skeleton

This was an unfortunately busy week for me (2 grading sessions + my own prelim) and as a result, I'm attempting something quite unambitious.

In the very first week of CS 2112, Prof. Myers would always go and tell the freshman that any good compiler will take an expresison of the form x * 2 and turn it into x << 1. I know for a fact that clang already has an optimization that will do this, but I decided it would be fun to try to write my own.

My pass searches every instruction to see if it is a binary op, and if so, if it is a multiplication command. If both are true, it then checks if the right operand is a constant integer, and finally, whether it is 1) greater than 0, 2) less than or equal to the largest integer that can be safely stored as a double, and 3) whether the floor of the base 2 log of the value is equal to the base 2 log of the value. In other words, it is a multiplication by a power of 2 that I can safely compute with doubles without worrying about loss of precision.

Anything that matches this pattern will be rewritten into a left shift, keeping the left operand the same, and using the constant computed value of the base 2 log as the right operand. Like the example shown in class, I do not bother removing the old operand, instead relying upon dead code elimination to do that.

For testing, I first wrote a manual test case with multiplication by many different integers. I verified manually by looking at the IR that only the ones that were powers of two were rewritten. I also verified that the original operands became dead code, unused in the future. Also, I ran the code to ensure the outputs matched before and after my optimization. I then, with the help of Annabel, found a nontrivial linear algebra library written in C. Compiling this with my pass, I also checked the output of the IR to see if it looked reasonable.

The hardest part was getting the LLVM project to run on my computer at all. I initially tried to make it work in Ubuntu on WSL, but no matter what I did, the pass would either print nothing or crash complaining about running out of memory. I tried cleaning everything, including cmake and llvm, off my machine and reinstalling everything from scratch. It still would not work. I got this far in class and could not figure out why.

I next tried to make it work on Windows, but first, the clang binaries for Windows apparently do not come with any of the tools needed to hack the compiler - in fact, they've stated on GitHub that they just do not make those for Windows. Luckily someone had made a tarball with all those tools, so with clang working, I then tried to build the project, but then cmake complains that pass plugins are not supported on this platform. This sounded pretty fatal.

So, out of options, I dipped into $15 of my leftover free Azure credits to spin up a Linux VM in the cloud. SSHing into this box meant I finally got the thing to compile and run, and that's where I ended up doing all my work.

2 replies

ethanuppal Mar 14, 2025

Did you try running in a Linux Docker container?

sampsyo Mar 18, 2025
Maintainer Author

Incredibly annoying about LLVM on WSL! Sorry that gave you so much trouble; I would have expected all that to work. A Docker container on Windows might not be a bad alternative…

gerardogtn · 2025-03-14T04:04:57Z

gerardogtn
Mar 14, 2025

CODE

For this assignment i aimed to create a simple profiler tool. The idea is that an instruction count tool can keep track of the number of instructions that all functions in a c program execute to give a rough idea into where bottlenecks might be occurring or for simple empirical evidence of analysis of algorithms. For my real world example I implemented insertion sort in c and and executed sort on one array of 10 elements and another of 100 elements; and the profiler tool identified that the sorting of 10 elements lead to the execution of 1,570 instructions whereas of 100 elements to the execution of 110,065 instructions (in line with an O(n^2) algorithm). Overall, I aimed for a not a super ambitious implementation this week but i'm still happy with the results and about having a way to measure instruction count automatically for any c program.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Very nice; glad this worked out in a straightforward way!

Annacaro22 · 2025-03-14T04:15:21Z

Annacaro22
Mar 14, 2025

Code here

For this week's tasks, getting LLVM running on my computer was maybe the hardest part. Once I got it all running, though, the task itself was not difficult, especially since I used the template already given to us in the Skeleton.cpp file. I took all adds in a file and turned them into subtracts with the same arguments. The most difficult thing was certainly handling the difficult-to-parse LLVM syntax, and the opacity of the documentation. Also, I have very limited experience with C++ as well, so, though the syntax isn't as uninuitive there, I struggled with the one-two punch of two new language syntaxes.

For testing, I used my own toy test cases, and ran the file on a C file I found online, from this repo, which does matrix and vector manipulations. I worked (distantly; more like we consulted with each other) with Michael on this project and he took the repo and stuck a lot of the files together into one big megafile, which I used to make sure everything ran smoothly.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Sorry that the LLVM setup bit was a hassle! I'm glad this straightforward pass worked out OK though. :)

mse63 · 2025-03-14T05:16:10Z

mse63
Mar 14, 2025

Link to Code

Getting LLVM on my computer was a little trivial. I use NixOS, so it was just a matter of adding clang llvmPackages.llvm and cmake to the list of packages in the shell.nix file that defines my environment for this class.

For the task, I initially wanted to make a pass that would replace the logical && and || with their bitwise equivalents: & and |. Although it didn't initially seem too hard, it would be in some sense "useful" because the former short circuits and the latter doesn't, so I could imagine someone using this as a way to disable short circuiting in their program to test for unintended behavior due to side effects. However, I realized that && and || are really just C constructs and are actually implemented with branches in LLVM, and trying to detect patterns for && and || seemed too difficult.

Instead, I opted for something less ambitious and just made a pass that will print every time a function is called. The idea is that someone could possibly use this as a simple way to check something like how many times each function is called when their program runs without going and adding print statements in every function. The actual implementation wasn't that difficult, as I mainly just modified the skeleton code in the tutorial.

For testing, I made a simple C program that had 4 different ways to calculate the nth Fibonacci number, and got this output:

main
fib_tree
fib_tree
fib_tree
fib_tree
fib_tree
fib_tree
fib_tree
fib_tree
fib_tree
_ZL6printfPKcU17pass_object_size1z
Fibonacci(4) using tree recursion: 3
fib_tail
fib_tail_helper
fib_tail_helper
fib_tail_helper
fib_tail_helper
_ZL6printfPKcU17pass_object_size1z
Fibonacci(4) using tail recursion: 3
fib_iter
_ZL6printfPKcU17pass_object_size1z
Fibonacci(4) using iteration: 3
fib_exp
_ZL6printfPKcU17pass_object_size1z
Fibonacci(4) using exponential formula: 3

I believe this work deserved a Michelin Star because although it is a bit unambitious, this plugin could be useful in some (admittedly contrived) situations.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Sounds good overall! Indeed, it's not too hard to imagine where such a pass would be useful as an alternative to literal printf-debugging. :)

ananyagoenka · 2025-03-15T00:49:36Z

ananyagoenka
Mar 15, 2025

Source Code

I implemented an LLVM pass that rewrites floating-point division operations into multiplications by the reciprocal, essentially transforming x / y into x * (1 / y). To test my implementation, I used a simple C program containing several division cases: one with a variable divisor (a / b), one with a constant divisor (a / 4.0), and one with an expression that simplifies to a constant (a / (2.0 + 2.0)). I verified the correctness by inspecting the emitted LLVM IR, which showed that non-constant divisions were replaced with a reciprocal computation followed by a multiplication, while constant cases were optimized directly by LLVM. I also ran the transformed binary to check for performance improvements, observing that the modified IR offered equivalent results with the potential for better runtime efficiency on non-constant operations.

This work would be even cooler if I could extend the pass to handle more cases. For instance, I could avoid unnecessary floating-point extensions (the current implementation sometimes promotes operations to double and then truncates them back to float), thereby keeping the calculations entirely in float precision. Moreover, optimizing more complex cases—like hoisting the reciprocal calculation out of a loop when the divisor is computed at runtime but remains constant across iterations—could further boost performance. The hardest part of this task was safely modifying the IR without invalidating iterators; I solved this by first collecting the target instructions and erasing them afterward.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Neato; this sounds like fun! Good work identifying a straightforward pattern-rewriting transformation that is actually semantics-preserving. :)

It does make sense that iterator invalidation would get in the way here; I'm glad you were able to work around it.

arnavm30 · 2025-03-15T02:25:52Z

arnavm30
Mar 15, 2025

Code

I wrote a LLVM pass that goes over each instruction in a program and if it is an integer multiplication or division with at least one operand that’s a power of 2, then it’s converted to a bit shifting operation. I identify if an integer value is a power of 2 by zero-extending it to a 64-bit unsigned integer with getZExtValue(), then calling isPowerOf2_64(), and if so, finding the shift amount using Log2_64(); all of these functions are conveniently defined within LLVM.
For an integer multiplication with at least one operand that’s a power of 2, $x * 2^n$ or $2^n * x$, is transformed to a logical shift left of n on x, $x << n$. For an unsigned integer division with a second operand that’s a power of 2, $x / 2^n$, is transformed to a logical shift right of n on x, $x >>> n$. For an signed integer division with a second operand that’s a power of 2, $x / 2^n$, is transformed to an arithmetic shift right of n on x, $x >> n$. These transformed are done by inserting the new instruction after the instruction its replacing, then replacing all uses of the old instruction with this new instruction, and then unlinking the old instruction from the basic block and deleting it.

I tested and evaluated the pass by writing a test program that generated 10 million operations that are a mix of integer multiplication (with the power of 2 randomly in either operand), signed integer division, and unsigned integer division. The result of each operation is output to a file, along with the runtime on the last line. I compiled and ran the test program with and without the SkeletonPass plugin included and diff-ed the resulting output files. In the end, it did not like seem the transformations had any or much of an effect. Averaged over 3 trials, running with the plugin was 4.68s while without the plugin was 4.71s. When I removed writing the output to files (and only included a print of the elapsed time), which may bias the results, and increased the number of operations to 10 billion: running with the plugin was 80.29s while without the plugin was 81.58s, so a very minor speedup for a contrived situation.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Neato; this seems cool! I'm not exactly sure how to explain your performance result without error bars on those averages, but one possibility here is that the way you invoked LLVM was essentially already performing this strength-reduction optimization (yielding identical code in the end). Anyway, seems like a fun thing to explore!

smd21 · 2025-03-15T02:36:46Z

smd21
Mar 15, 2025

https://github.com/smd21/cs6120_homework
My LLVM setup for C++ was incredibly jank, so I decided to switch to @ethanuppal 's rust method. The Rust API has slightly more intuitive typing, which made it a lot easier for me to poke around and get a feel for how LLVM works. However, though it was easier for me to figure out than clang/c++, it still took me the better part of two days to debug issues with my setup.

Once I got everything running, I implemented a pretty barebones pass that just prints out a message when a function is called or a branch is taken. I tried to get this pass to print out the name of the function/branch, but ran into some issues due to not really understanding how stuff is stored in the API. I have a sense as to where I went wrong (I think I made some faulty assumptions), but am still working on fully implementing this functionality.

For testing, I ran my program on a couple of different C files that call various functions from main and have loops/if statements/etc to ensure the IR had branch statements. I inspected the output manually and made sure it seemed to be what I expected (honestly, I kind of used this as an opportunity to just see the difference between all the control flow instructions in LLVM lol)

2 replies

ethanuppal Mar 15, 2025

To get the name of the callee in a call instruction, see this function I wrote: https://github.com/ethanuppal/cs6120/blob/153cb81fc930c8434e2d2a377b2b837f19ba0d77/lesson7/llvm-pass/src/lib.rs#L41-L48

sampsyo Mar 18, 2025
Maintainer Author

Well, despite all the hurdles you had to jump through to try doing the normal C++ version, I'm glad the Rust route worked out for you in the end!

tean-lai · 2025-03-17T06:24:51Z

tean-lai
Mar 17, 2025

code

I implemented some conversions into bit operations where possible, like multiplication -> left shift, or division -> right shift, or modulo -> bitwise and. Relied a lot on isPowerOf2 that LLVM comes with, which proved kind of handy. The implementation was fairly simple, a lot of issues were just from setting things up. I also attempted some relatively basic constant folding, but encountered some weird behaviors. I had a line int a = 9 + 10, but the interesting thing was that I was not able to detect an binary operations, which kind of puzzled me, I eventually pivoted because I wanted to work with something simple that I could more easily observe.

I tested it on a few manually written programs, to see that the instructions were being replaced and the original instructions were being removed. I've been trying to get this running on a more "industrial-strength" program, specifically trying this on x264. It seems kind of hard to know for sure if you compiled something with only the pass you want, especially in a more complicated make system. I've found some benchmarks for x264 online (phoronix-test-suite), which I'm going to try running when they download, hopefully there will be some differences in performance, but I'm a little pessimistic about this since I expect multiplication and division operations to already be fairly efficient in hardware compared to other instructions.

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Neato! FWIW, this seems cool, but I wouldn't expect to see lots of speedup in x264 because (a) with optimizations turned on, LLVM will already do this, and (b) with optimizations turned off, the running time will probably be dominated by needless memory accesses.

I had a line int a = 9 + 10, but the interesting thing was that I was not able to detect an binary operations

I think the reason for this is something I actually briefly mentioned in class at one point: when compiling to LLVM, Clang does some extremely limited on-the-fly constant folding. So this code will get optimized to int a = 19, even with optimizations turned off (-O0):
https://godbolt.org/z/a9KsdvqaM

parthsarkar17 · 2025-03-17T15:40:07Z

parthsarkar17
Mar 17, 2025

Code

Summary

I made a tool that checks if a instruction is a multiplication instruction; if so, I print out that a multiplication instruction was found. I also had a part of my pass where, if an instruction was a binary operator and it its arguments were commutable, then I would swap the arguments. To get these methods from the LLVM internal rep, I spent a good amount of time on the BinaryOperator doxygen.

Testing

I took a matrix multiplication benchmark from a benchmarks repo to test my pass. For each of the dynamically-executed multiplication instructions, I indeed saw my expected printed output. I also made sure to print out the LLVM IR code for this C program, with and without my pass being registered; for the binary operator instructions where swapping the arguments was equivalent, I made sure that these instructions had their arguments swapped.

Hardest Part

For me, the hardest part was navigating the documentation and figuring out the types of methods and general C++ syntax. I have never written any C++ before, so this was cool, but also a little overwhelming. I feel like it would be really cool to get good at playing around with the LLVM internal rep, though!

I think my work deserves a star!

1 reply

sampsyo Mar 18, 2025
Maintainer Author

Great! This all sounds like it went quite smoothly. I am impressed that this was your first attempt ever at writing C++. Nice work avoiding it up to this point! :)

mariasoroka · 2025-03-18T20:43:23Z

mariasoroka
Mar 18, 2025

Here is my code: link

I am very late with this assignment. What I wanted to do was not necessarily ambitious, but I was curious to try something like that.

There is this renderer called Mitsuba3. I wanted to do something about IR of the renderer so that I can visually see the effect of the change. My original plan was to make all diffuse objects look darker, or another idea was to make all green objects red. I did not manage to do exactly that because the renderer uses complex data types and it was not obvious how to scale a color vector. I was hoping that I will be able to figure it out by explicitly multiplying the vector by a constant in the cpp code and comparing the IR of this function with the original one, but the difference in the two IR was too hard to interpret. So, I decided to scale a scalar function by a constant. It took me some time to make it work, to be honest. For a while I could see no change when compiling with my pass. Later, I figured that the reason was which variant of Mitsuba I was using (llvm_ad_rgb should have been replaced with scalar_rgb). Now it works. Here are the two different images: the original one and the modified one.

1 reply

sampsyo Apr 29, 2025
Maintainer Author

Wow; super cool! I can get how instrumenting nontrivial data structures would be tricky, but you came up with an easier option that totally worked out.

KabirSamsi · 2025-03-20T18:16:14Z

KabirSamsi
Mar 20, 2025

Code
Partner: @noschiff

Overview

This is very late, for which we both apologize.

For our pass, we implemented a handful of analysis and optimizations on multiplication and division operations, as well as safety checks on division. Firstly, we implemented a checker for division by a constant 0, which handles such cases by just setting the expected quotient to zero.For this, we considered a few options – including preserving the dividend, or even setting the result to some arbitrary constant. Ultimately, we ended up doing the 0 resetting and also outputting an informative message.

For cases of dividing by a nonzero constant, we implemented a right-shifting optimization, such that we sequence right-shifting as much as possible, before continuing with the remaining division in the case that the divisor is nonzero. Again, every time a division instruction is identified with a constant divisor, this is identified and noted, with the corresponding optimization onted as well.

For multiplication, we implemented left-shifting for multiplication done by a constant on either the LHS or RHS. We also put in constant propagation of sorts (that is to say, mul a b -> a*b), though I'm not totally sure if this was needed (since oddly, when I tried testing, it seems like the expression was simplifying before my pass even hit it. An interesting consideration of where that optimization occurs.)

Here's a little snippet of the outputs given as a program is optimized:

Testing

During development, we tested against a simple handmade calculator program we wrote, which is accessible in our code. Subsequently, we put in benchmarks for fast exponentation, factorial computation and are testing on a handful of others we found in the LLVM benchmarks repository.

Given how short these processes already are, it's not too clear if this greatly does optimize the code, but it does preserve correctness, as outputs remain consistent!

Takeaways

I (somewhat surprisingly) found this more challenging than a lot of the other exercises. While I find pure C++ in general to be pretty straightforward (minus debugging), it was a difficult for to wrap my head around some of the syntax and the different methods. However, after taking some time to actually review the object structure of program representations and the different keywords, this just turned into plain old C++, and it became a relatively relaxing exercise to do.

1 reply

sampsyo Apr 29, 2025
Maintainer Author

Sounds good! This was a somewhat ambitious set of things to do with LLVM instrumentation pass. Nice work getting it all going! I'm not too shocked that this was harder than some other tasks—LLVM can be great to work with once you've ramped up, but ramping up definitely takes a LOT of working with the docs, trial and error, etc.

Jonahcb · 2025-05-01T19:28:25Z

Jonahcb
May 1, 2025

Code

My basic pass locates all the store instructions and prints out the type of value being stored. This was mainly for my own curiosity as my friend had recently asked me how much of coding in C++ is spent dealing with pointers. This was still complicated enough to force me to look at LLVM's documentation, so I got a brief tour of the docs.

I tested it on my trace-based optimizer files. There were a lot more store instructions than I expected in that small program (13,760 store instructions).

Here is a pie chart of the proportion of stores:

So, yes, a lot of pointers.

1 reply

sampsyo May 8, 2025
Maintainer Author

:) It does seem reasonable that there would be a lot of pointers—but especially so in unoptimized LLVM IR, straight out of the Clang frontend. It would be interesting to see what this pie chart looks like after running the -O1 optimizations, for instance.

calciiium · 2025-05-09T04:20:24Z

calciiium
May 9, 2025

code
For this lesson, I did a simple llvm pass that inverts branches if the second branch has a larger size than the first one. I was wondering how could this affect the execution time of the compiled code. So I tested on a cpp program that has a huge difference on number of iterations for its two branches. The result is kind of expected, probably because of the built-in well-designed optimization llvm has, there aren't a noticable difference in execution time regardless of inverting the branch or not.

1 reply

sampsyo May 18, 2025
Maintainer Author

if the second branch has a larger size than the first one

Looking at your source code, it looks like what you mean by this is that you're using the size of the jumped-to basic blocks, i.e., the number of instructions in the destination. It kinda makes sense that this would not change much.

InnovativeInventor · 2025-05-11T19:15:17Z

InnovativeInventor
May 11, 2025

I implemented a super simple inliner fuzzer for LLVM.¹ The goal is to help explore the inliner decision-space. Several ideas to use this include:

sampling from the distribution of costs obtained by uniformly randomly making inlining decisions to understand how "close" various heuristics are to "optimal" (how many standard deviations from the mean is LLVM's current inlining heuristics from flipping a coin, measured using wall clock time?)
randomly searching through the decision space to find more optimal inlining decisions
superinlining / superoptimizing the inliner

But I did not have time to explore any of these cool things. Right now, the pass currently uses an RNG that can be seeded at compile-time, e.g. like so:

$(brew --prefix llvm)/bin/clang -fpass-plugin=build/skeleton/SkeletonPass.dylib -mllvm -rng-seed=$RNG a.c

Correctness. This was tested on two hand-written c programs, a simple fibonacci program and a simple addition program. It's more-likely-than-not correct as I used several built-in LLVM utilities and modeled my code off of the built-in LLVM AlwaysInliner pass.

https://github.com/InnovativeInventor/llvm-pass-skeleton/blob/master/skeleton/Skeleton.cpp ↩

1 reply

sampsyo May 18, 2025
Maintainer Author

Cool! It would indeed be interesting to just see what the distribution of code size or performance looks like.

Lesson 7: LLVM #455

Uh oh!

sampsyo Jan 21, 2025 Maintainer

Replies: 24 comments · 30 replies

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Pass

Conclusions

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Pass

Conclusion

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Automatic memoization of pure functions

A more practical example

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 18, 2025 Maintainer Author

Uh oh!

Uh oh!

sampsyo
Jan 21, 2025
Maintainer

Replies: 24 comments 30 replies

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author

sampsyo Mar 18, 2025
Maintainer Author