Discussion on Provably Correct Peephole Optimizations with Alive #597

Nikil-Shyamsunder · 2025-10-09T18:55:49Z

Nikil-Shyamsunder
Oct 9, 2025

Discussion thread for: Provably Correct Peephole Optimizations with Alive
Discussion leads: Nikil Shyamsunder, Shihan Fang, Joseph Maheshe, Ruolin Ye, I-Ting Tsai

mercure67 · 2025-10-14T14:45:31Z

mercure67
Oct 14, 2025

critique

The paper appears to do something which retroactively seems somewhat obvious --- trying to verify that optimisations replace the code with something which doesn't introduce more undefined behaviour. The paper suggests that this is a more nuanced process than this 'easy' question suggests, and provides a pretty well-rounded full implementation to actually help apply these ideas. Subjectively it feels a little light on metrics, but correctness is something which is not necessarily easy to quantify the effect of. They mention the effect of their code on compile times, but if the software produced is less prone to error, then perhaps such a tradeoff is not the worst thing in the world. And if an optimisation increases performance but has a chance of breaking code, then attaching a quantity to its increase is perhaps not that meaningful. All this to say, correctness is not necessarily quantifiable, but is nonetheless a valuable 'subjective' metric to mention; and it is actually put into practice by the authors, which is pretty neat.

Also, looking at alive2 by the same authors, they add a Redis database dependency... I wonder if this is for SMT query caching.

questions

How useful is such an approach for potential IRs with minimal undefined behaviour? Is such a rigorous proof unnecessary or might it actually start to negatively affect performance (of compiler / generated code)?

Are the optimisations proven by Alive either too or insufficiently cautious? For example, other than the explicit lack of floating-point and inter-procedural analysis, are there cases where Alive might assume something that isn't necessarily true; or where it requires too many assumptions?

How can we evaluate 'correctness' as a metric in PL / compiler design --- is it even worth attaching numerical value to?

1 reply

maheshejs Oct 15, 2025

Very thoughtful questions, thanks! And thanks for the link to Alive2!

I'd say it is still useful to check correctness of those optimizations even when there is minimal undefined behavior. It's true the bulk of the bugs found was the introduction of undefined behavior (see section 6.1), Alive was also able to find two bugs were the value of an expression was incorrect for some inputs.
Yes, they mention in section 5 that there could be differences between their formalization and the semantics intended by LLVM developers. One example is that Alive’s bounded verification could lead to incorrectly verify an optimization, though they argue it is uncommon to see operands wider than 64 bits in LLVM code.
Yes, it's not so easy to quantify correctness. Formal verification could be way, but that could be too expensive for a whole compiler pipeline

Jacqueline-Wen · 2025-10-14T21:21:40Z

Jacqueline-Wen
Oct 14, 2025

Critique

This paper introduces the concept of peephole optimization, where we examine peepholes (short sequences of instructions) and replace them with shorter/faster equivalents without altering the program's behavior.

For me, this optimization needs to be paired with other global optimizations to fully optimize a program. While I do think this optimization technique is highly effective, the surface area that it can optimize is limited because it only optimizes small instruction windows.

Additionally, I am curious about the practicality of using Alive to verify transformations. The authors mention that “for some transformations involving multiplication and division instructions, Alive can take several hours or longer to verify the larger bitwidths”. This seems like an unreasonably long amount of time to wait for a single verification.

Questions

How widely used is Alive in 2025?
If so, what are the common use cases?

1 reply

maheshejs Oct 15, 2025

Great questions! Alive, especially its successor, Alive2, is still widely used these days (its most recent commit dates from just 2 days ago!). Developers use it to find and fix bugs in the LLVM compiler and to ensure new optimizations are sound. Alive2 extends Alive and supports any intra-procedural optimization.

magg1egao · 2025-10-15T01:56:39Z

magg1egao
Oct 15, 2025

Critique:
This paper introduces Alive, a domain-specific language and verification framework designed to ensure the correctness of peephole optimizations in the LLVM compiler infrastructure. I think the paper does a nice job of explaining how this formal verification can actually be used in real compiler development. Alive makes complex LLVM transformations much easier to read and reason about. It also automatically transforms into C++ code compatible with LLVM’s InstCombine pass. It’s nice how the authors make formal correctness accessible to everyday LLVM developers. However, although it is accessible, I wonder how practical Alive is for day-to-day use. The paper mentions that some verifications can take several hours or longer to verify larger bitwidths. It would be quite inconvenient for compiler engineers to adopt this if it slows down their workflow significantly. Nevertheless, it’s very cool that the tool actually found real LLVM bugs that had gone unnoticed for years.

Question:
In the paper, Alive verifies the correctness of peephole optimizations in isolation. However, how would these optimizations fare in the context of a compiler pipeline with multiple optimizations?
Could there be cases where individually correct optimizations interfere with each other or change the expected results when applied together?

Alive is entirely dependent on LLVM IR. Since Alive is so reliant on LLVM’s semantics, what might happen if LLVM’s IR changed in the future? Do you think that would break Alive, or could it adapt easily?

1 reply

maheshejs Oct 15, 2025

Yup, found it really cool that they found real LLVM bugs that had been patched too.

Regarding multiple optimizations, Alive was designed to verify peephole optimizations only. However, since then, it has extended in 2021 with Alive2 which can now handle whole-function verification with different optimizations. Alive2 claims a zero false-alarm rate, which suggests that composing correct optimizations results in a similarly correct optimization.

Yes, Alive is entirely dependent on LLVM IR, which has become a universal IR, since it was designed with the goal of correcting LLVM optimizations. If LLVM's IR happens to change in the future, I'd say Alive would follow the new LLVM semantics.

SerenaYZhang · 2025-10-15T03:01:04Z

SerenaYZhang
Oct 15, 2025

Critique

It's really cool that Alive managed to find hidden bugs in LLVM, and designing it to look like LLVM code was a great choice for getting developers on board. However, a downside to Alive is that it runs into significant performance bottleneck when verifying complex arithmetic operations. The paper admits that checks involving multiplication or division can take hours for the SMT solvers to process because of extremely large numbers, which is a big practical limitation. I wonder if there's a way to make it faster, maybe by using a quicker but less precise check first, before engaging the heavy SMT solvers for the most challenging cases.

Question

Alive is explicitly designed for peephole optimizations, so it does not reason about control flow (branches and loops). How could we extend Alive beyond local changes to verify larger transformations that span multiple basic blocks? How would the verification complexity change? What new set of formal issues would arise that are not present in local peephole optimization?

1 reply

maheshejs Oct 15, 2025

Yeah, would be a nice idea to have multi-stage filtering, as you suggest, with SMT coming last!

As for control flow, yes, Alive only targets peephole optimizations in a given basic blocks. However, it has been extended in Alive2 to support CFGs and whole-function verification in the 2021 paper "Alive2: Bounded Translation Validation for LLVM".

SyphonArch · 2025-10-15T19:14:28Z

SyphonArch
Oct 15, 2025

Critique

The Alive paper presents a practical approach to formally verifying compiler optimizations, particularly within LLVM’s InstCombine pass. While effective, this scope is narrow - it raises the question of whether the same SMT-based framework could scale to more complex middle-end or backend optimizations that involve richer control flow or memory interactions.

The work also highlights how informal many compiler optimizations are. The fact that Alive can find bugs in production passes shows that compilers often outpace our ability to reason about them formally. This might suggest a need for future IRs designed with verifiability and semantic clarity in mind, not just performance.

I’m also intrigued by the way Alive formalizes LLVM’s permissive undefined behavior (UB) model. While necessary for verification, it also exposes how fragile such semantics can be. Tools like Alive might push IRs toward more explicit, well-defined semantics that are easier to reason about.

Discussion Questions

Should compilers continue relying on UB for performance, or should IRs move toward more explicit "defined overflow" and "checked behavior"?
Is SMT solving a scalable foundation for everyday compiler development, or is it better suited as an offline validation step?
What are the most realistic barriers to integrating Alive-generated rules directly into production compiler pipelines?
How could one extend Alive’s approach to optimizations that modify control flow (like loop unrolling or inlining)?

1 reply

sunwookim028 Oct 16, 2025

I am also curious to know how Alive performs for different scenarios, and, why. My discussion question is, does domain specific language for writing passes enable portability? Can we design some pass DSL that is purely algebraic so it is very portable and also verifiable?

az275 · 2025-10-15T20:51:38Z

az275
Oct 15, 2025

I really liked this paper; I found it quite clear and painless to read. I have absolutely no expertise in formal verification, but the correctness constraints were clear and intuitive, and I appreciate that the authors designed the tool to be easily usable by developers and demonstrated its utility in practice by finding bugs in actual LLVM optimizations. I guess my main comments/questions might be a bit outside the scope of the paper. (1) SMT solvers can be very costly for transformations involving large bitwidths; while I guess this isn't an area where speed directly matters, it does raise questions about scalability and whether it's possible to improve or bypass solvers. (2) This paper is limited to peephole optimizations. Verifying other (non-local, control flow aware) optimizations seems like a very difficult task; is there any work/tools out there which do this, and what would that look like?

1 reply

maheshejs Oct 15, 2025

Found it super clear too to read.

(1) Regarding longer verification time, the authors hope for further improvements in SMT solvers since their work is relying on those solvers. As for bypassing, @SerenaYZhang hinted at a quicker but less precise check preceding the use of SMT solvers. I'd think that's a nice idea to explore for filtering out simpler cases.

(2) Yes, there is a tool called Alive2 which extends Alive. Alive2 supports any intra-procedural optimization.

tf-mac · 2025-10-16T01:49:27Z

tf-mac
Oct 16, 2025

Critique:

I agree with @jeffreyqdd that the paper was well written. My reading gave me a clear understanding of the system they were trying to implement. I particularly found the proof of correctness of transformations enlightening. Not only did they details how to prove their transformations were correct, they also gave a very specific example showing how it was done. My only concern is that there are various instances in the paper where they allude to certain formulas being difficult or slow. It seems like these may have been brushed aside or left for future work.

Discussion Questions:

It seems that the authors intended for this to slowly replace LLVM (or something like it). LLVM is obviously massive; what transitional help did they give? Specifically, how do they offer authors the ability to expand on this and use alive in more complex manners.

0 replies

Smubge · 2025-10-16T01:57:19Z

Smubge
Oct 16, 2025

##Critique##
This paper was very interesting, and seemed to strike a balance between theory and practicality. I was very interested with how undefined values were split up and dealt with. Overall, the paper seemed easy to understand and comprehend, while Alive also seemed (on the surface, as I haven't used it) to be quite simple. I find it amazing that a few LLVM developers are already using Alive to avoid creating bugs, as well as that they are being used on LLVM patches. I wonder how Alive would compare to other compiler verification tools. On further research, I had realized that Alive has a successor, Alive2.

##Discussion Questions##
How is Alive being used in modern-day? I know that Alive2, its successor, exists, but what is being done with Alive2?
What would happen if instead of Alive, the authors chose to embed these verifications within the LLVM itself?

1 reply

Nikil-Shyamsunder Oct 16, 2025
Author

I thought it was interesting to look into Alive2 as well! After some digging on the internet, I found out that Alive2 is still actively used for automatically verifying optimizations for LLVM. I think Embedding these proofs directly inside LLVM would have been much more complex, since it would tightly couple verification logic with compiler infrastructure; separating it into a tool allows independent reasoning and more flexible testing.

jku20 · 2025-10-16T02:28:26Z

jku20
Oct 16, 2025

Critique: I found the last paragraph of 3.3.3 interesting. Due to efficiency concerns with SMT solvers, the authors choose an encoding for memory instructions linear in the number of loads and stores. This seems practically effective, but looses information compared to encodings build directly from arrays. The authors note that this wasn't required by any optimization they analyzed. This makes me wonder if there is any interesting reason these optimizations don't exist, especially with undefined behavior taken advantage of in other optimizations.

Questions: I found the paragraph at the end of 5, threats to validity interesting. It mentioned the model may not properly represent the semantics intended by the developers of LLVM. Given the use of Alive as almost like a checker for validity of peephole optimizations, I wonder what other formalization of the semantics of these subsets of LLVM exist and if they are consistent with Alive's.

P.S. Sorry for the lateness, I may or may not have forgot this deadline existed.

0 replies

Mond45 · 2025-10-16T03:23:24Z

Mond45
Oct 16, 2025

Critique:

I find the paper interesting in how Alive combines practicality with formal verification. The design of the Alive DSL is intuitive, as it closely resembles LLVM IR, which makes adoption by LLVM developers much easier. Also, C++ code generation feature makes Alive practical to incorporate into LLVM development. I also like that the tool produces human-readable counterexamples, choosing examples that are easy to understand. And it's impressive that Alive actually discovered real bugs in several LLVM optimizations.

However, the paper also discusses a few limitations. For example, Alive currently doesn't support floating-point number and branches, which limits the range of optimizations it can express. Moreover, there are scalability issues: the SMT solver can take many hours to verify certain transformations, particularly those involving multiplication or division.

Questions:

How does Alive compare to randomized testing approaches (maybe, in terms of performance or coverage), for transformations that take longer to verify?
What approaches could be used to improve verification time for transformations involving multiplication or division?
How might floating-point support be incorporated into Alive?

0 replies

SolidLao · 2025-10-16T06:09:37Z

SolidLao
Oct 16, 2025

Critique

Alive is a combination of (1) a DSL for expressing optimizations (especially for peephole optimizations in LLVM), (2) an SMT-based verifier for correctness, and (3) an automatic code generator for LLVM. It is a very cool tool for developers to write correct peephole optimizations.

Questions

It still takes many efforts to learn and use the DSL in Alive. Alive is developed in 2015. After that, is there any tool to enable users work directly with LLVM IR?
Does Alive2 support floating point, aggregate types, and labels? These are not supported in Alive as claimed by the authors.

0 replies

CynyuS · 2025-10-16T19:22:37Z

CynyuS
Oct 16, 2025

Critique

I think its super cool the amount of program verification and analysis we can make for just peephole optimizations. The casework and proof about undefined behavior within LLVM was very intuitive, and I like how they handled verification for unsafe memory as well. I am also amazed that Alive and its successor Alive2 is being consistently used to verify programs. One critique I had was I wish there was more explanation on why they decided to implemented Alive in Python, and how that impacts overhead when attaching the verifier onto LLVM passes. In diving down the rabbit hole of Alive2, I realize they were able to optimize the verifier by rewriting it in C++, optimizing away the memory bottlenecks for the SMT solver that the Python implementation had.

Questions

I had a question relating to @SerenaYZhang and @maheshejs comments above - what simplifications could be made in a multi-pass verification such that SMT solvers could be more bounded? Is there a way to simplify the many multiplication and division operations - perhaps split the numbers and do the SMT solving in parallel?

0 replies

zc579 · 2025-10-17T02:22:52Z

zc579
Oct 17, 2025

critique
The paper presents Alive, a language and tool for verifying and generating LLVM peephole optimizations, which offers notable contributions to compiler correctness. However, Alive fails to support key LLVM optimization scenarios widely used in industry: it does not handle floating-point operations, vector types, control flow constructs, direct heap memory operations, or full LLVM IR features. These omissions disconnect Alive from real-world compiler demands

question
For control-flow optimizations, does Alive’s SSA-based directed graph verification model require restructuring? If restructured, how will the trade-off between verification efficiency and control-flow complexity be balanced?

0 replies

pedropontesgarcia · 2025-10-17T02:24:13Z

pedropontesgarcia
Oct 17, 2025

Unfortunately, I couldn't join the discussion this morning, but from what I hear it was engaging :)

Critique: I love the mathy bits of this paper. As a PL enthusiast, I really appreciate formally defined type systems and grammars. The ease of use of Alive for developers familiar with LLVM is also remarkable. I also feel like I gained insight into LLVM itself and its weak points by reading the paper. The counterexample generation is super neat. The correctness theorems also make me quite happy and are relatively easy to follow. The low-hanging critique here is, Alive successfully automates verification for many LLVM passes, but it has verification bounds (like limited bitwidths and partial coverage) and so it's reasonable to assume that it leaves gaps that could miss errors. I haven't dived into Alive2, so I wonder if the coverage is better there!

Question: How usable is this? How does this approach balance the tradeoff between actual formal verification and practical usability for developers?

0 replies

NingWang0123 · 2025-10-23T12:19:44Z

NingWang0123
Oct 23, 2025

Critique :
Excellent UB/poison-aware specs that catch real InstCombine bugs, but solver time can spike on non-linear rewrites.
Coverage is uneven (historically weaker on FP/CFG), so some high-value rules sit outside its sweet spot.
Specs must track evolving LLVM semantics and flags.

Question:
Given that correctness checking is costly, how should we triage which peephole rules to verify exhaustively vs. quickly or not at all? Concretely, can we design a heuristic risk score H(rule) to prioritize Alive/Alive2 checks

0 replies

Discussion on Provably Correct Peephole Optimizations with Alive #597

Uh oh!

Replies: 15 comments · 7 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Critique

Questions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Critique

Question

Uh oh!

Uh oh!

Critique

Discussion Questions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nikil-Shyamsunder Oct 16, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Critique

Questions

Uh oh!

Critique

Questions

Uh oh!

Uh oh!

Uh oh!

Replies: 15 comments 7 replies

Nikil-Shyamsunder Oct 16, 2025
Author