Discussion on Provably Correct Peephole Optimizations with Alive #597
Replies: 15 comments 7 replies
-
|
critique The paper appears to do something which retroactively seems somewhat obvious --- trying to verify that optimisations replace the code with something which doesn't introduce more undefined behaviour. The paper suggests that this is a more nuanced process than this 'easy' question suggests, and provides a pretty well-rounded full implementation to actually help apply these ideas. Subjectively it feels a little light on metrics, but correctness is something which is not necessarily easy to quantify the effect of. They mention the effect of their code on compile times, but if the software produced is less prone to error, then perhaps such a tradeoff is not the worst thing in the world. And if an optimisation increases performance but has a chance of breaking code, then attaching a quantity to its increase is perhaps not that meaningful. All this to say, correctness is not necessarily quantifiable, but is nonetheless a valuable 'subjective' metric to mention; and it is actually put into practice by the authors, which is pretty neat. Also, looking at alive2 by the same authors, they add a Redis database dependency... I wonder if this is for SMT query caching. questions How useful is such an approach for potential IRs with minimal undefined behaviour? Is such a rigorous proof unnecessary or might it actually start to negatively affect performance (of compiler / generated code)? Are the optimisations proven by Alive either too or insufficiently cautious? For example, other than the explicit lack of floating-point and inter-procedural analysis, are there cases where Alive might assume something that isn't necessarily true; or where it requires too many assumptions? How can we evaluate 'correctness' as a metric in PL / compiler design --- is it even worth attaching numerical value to? |
Beta Was this translation helpful? Give feedback.
-
CritiqueThis paper introduces the concept of peephole optimization, where we examine peepholes (short sequences of instructions) and replace them with shorter/faster equivalents without altering the program's behavior. For me, this optimization needs to be paired with other global optimizations to fully optimize a program. While I do think this optimization technique is highly effective, the surface area that it can optimize is limited because it only optimizes small instruction windows. Additionally, I am curious about the practicality of using Alive to verify transformations. The authors mention that “for some transformations involving multiplication and division instructions, Alive can take several hours or longer to verify the larger bitwidths”. This seems like an unreasonably long amount of time to wait for a single verification. Questions
|
Beta Was this translation helpful? Give feedback.
-
|
Critique: Question: Alive is entirely dependent on LLVM IR. Since Alive is so reliant on LLVM’s semantics, what might happen if LLVM’s IR changed in the future? Do you think that would break Alive, or could it adapt easily? |
Beta Was this translation helpful? Give feedback.
-
CritiqueIt's really cool that Alive managed to find hidden bugs in LLVM, and designing it to look like LLVM code was a great choice for getting developers on board. However, a downside to Alive is that it runs into significant performance bottleneck when verifying complex arithmetic operations. The paper admits that checks involving multiplication or division can take hours for the SMT solvers to process because of extremely large numbers, which is a big practical limitation. I wonder if there's a way to make it faster, maybe by using a quicker but less precise check first, before engaging the heavy SMT solvers for the most challenging cases. QuestionAlive is explicitly designed for peephole optimizations, so it does not reason about control flow (branches and loops). How could we extend Alive beyond local changes to verify larger transformations that span multiple basic blocks? How would the verification complexity change? What new set of formal issues would arise that are not present in local peephole optimization? |
Beta Was this translation helpful? Give feedback.
-
CritiqueThe Alive paper presents a practical approach to formally verifying compiler optimizations, particularly within LLVM’s InstCombine pass. While effective, this scope is narrow - it raises the question of whether the same SMT-based framework could scale to more complex middle-end or backend optimizations that involve richer control flow or memory interactions. The work also highlights how informal many compiler optimizations are. The fact that Alive can find bugs in production passes shows that compilers often outpace our ability to reason about them formally. This might suggest a need for future IRs designed with verifiability and semantic clarity in mind, not just performance. I’m also intrigued by the way Alive formalizes LLVM’s permissive undefined behavior (UB) model. While necessary for verification, it also exposes how fragile such semantics can be. Tools like Alive might push IRs toward more explicit, well-defined semantics that are easier to reason about. Discussion Questions
|
Beta Was this translation helpful? Give feedback.
-
|
I really liked this paper; I found it quite clear and painless to read. I have absolutely no expertise in formal verification, but the correctness constraints were clear and intuitive, and I appreciate that the authors designed the tool to be easily usable by developers and demonstrated its utility in practice by finding bugs in actual LLVM optimizations. I guess my main comments/questions might be a bit outside the scope of the paper. (1) SMT solvers can be very costly for transformations involving large bitwidths; while I guess this isn't an area where speed directly matters, it does raise questions about scalability and whether it's possible to improve or bypass solvers. (2) This paper is limited to peephole optimizations. Verifying other (non-local, control flow aware) optimizations seems like a very difficult task; is there any work/tools out there which do this, and what would that look like? |
Beta Was this translation helpful? Give feedback.
-
|
Critique: I agree with @jeffreyqdd that the paper was well written. My reading gave me a clear understanding of the system they were trying to implement. I particularly found the proof of correctness of transformations enlightening. Not only did they details how to prove their transformations were correct, they also gave a very specific example showing how it was done. My only concern is that there are various instances in the paper where they allude to certain formulas being difficult or slow. It seems like these may have been brushed aside or left for future work. Discussion Questions: It seems that the authors intended for this to slowly replace LLVM (or something like it). LLVM is obviously massive; what transitional help did they give? Specifically, how do they offer authors the ability to expand on this and use alive in more complex manners. |
Beta Was this translation helpful? Give feedback.
-
|
##Critique## ##Discussion Questions## |
Beta Was this translation helpful? Give feedback.
-
|
Critique: I found the last paragraph of 3.3.3 interesting. Due to efficiency concerns with SMT solvers, the authors choose an encoding for memory instructions linear in the number of loads and stores. This seems practically effective, but looses information compared to encodings build directly from arrays. The authors note that this wasn't required by any optimization they analyzed. This makes me wonder if there is any interesting reason these optimizations don't exist, especially with undefined behavior taken advantage of in other optimizations. Questions: I found the paragraph at the end of 5, threats to validity interesting. It mentioned the model may not properly represent the semantics intended by the developers of LLVM. Given the use of Alive as almost like a checker for validity of peephole optimizations, I wonder what other formalization of the semantics of these subsets of LLVM exist and if they are consistent with Alive's. P.S. Sorry for the lateness, I may or may not have forgot this deadline existed. |
Beta Was this translation helpful? Give feedback.
-
|
Critique: I find the paper interesting in how Alive combines practicality with formal verification. The design of the Alive DSL is intuitive, as it closely resembles LLVM IR, which makes adoption by LLVM developers much easier. Also, C++ code generation feature makes Alive practical to incorporate into LLVM development. I also like that the tool produces human-readable counterexamples, choosing examples that are easy to understand. And it's impressive that Alive actually discovered real bugs in several LLVM optimizations. However, the paper also discusses a few limitations. For example, Alive currently doesn't support floating-point number and branches, which limits the range of optimizations it can express. Moreover, there are scalability issues: the SMT solver can take many hours to verify certain transformations, particularly those involving multiplication or division. Questions:
|
Beta Was this translation helpful? Give feedback.
-
CritiqueAlive is a combination of (1) a DSL for expressing optimizations (especially for peephole optimizations in LLVM), (2) an SMT-based verifier for correctness, and (3) an automatic code generator for LLVM. It is a very cool tool for developers to write correct peephole optimizations. Questions
|
Beta Was this translation helpful? Give feedback.
-
CritiqueI think its super cool the amount of program verification and analysis we can make for just peephole optimizations. The casework and proof about undefined behavior within LLVM was very intuitive, and I like how they handled verification for unsafe memory as well. I am also amazed that Alive and its successor Alive2 is being consistently used to verify programs. One critique I had was I wish there was more explanation on why they decided to implemented Alive in Python, and how that impacts overhead when attaching the verifier onto LLVM passes. In diving down the rabbit hole of Alive2, I realize they were able to optimize the verifier by rewriting it in C++, optimizing away the memory bottlenecks for the SMT solver that the Python implementation had. QuestionsI had a question relating to @SerenaYZhang and @maheshejs comments above - what simplifications could be made in a multi-pass verification such that SMT solvers could be more bounded? Is there a way to simplify the many multiplication and division operations - perhaps split the numbers and do the SMT solving in parallel? |
Beta Was this translation helpful? Give feedback.
-
|
critique question |
Beta Was this translation helpful? Give feedback.
-
|
Unfortunately, I couldn't join the discussion this morning, but from what I hear it was engaging :) Critique: I love the mathy bits of this paper. As a PL enthusiast, I really appreciate formally defined type systems and grammars. The ease of use of Alive for developers familiar with LLVM is also remarkable. I also feel like I gained insight into LLVM itself and its weak points by reading the paper. The counterexample generation is super neat. The correctness theorems also make me quite happy and are relatively easy to follow. The low-hanging critique here is, Alive successfully automates verification for many LLVM passes, but it has verification bounds (like limited bitwidths and partial coverage) and so it's reasonable to assume that it leaves gaps that could miss errors. I haven't dived into Alive2, so I wonder if the coverage is better there! Question: How usable is this? How does this approach balance the tradeoff between actual formal verification and practical usability for developers? |
Beta Was this translation helpful? Give feedback.
-
|
Critique : Question: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Discussion thread for: Provably Correct Peephole Optimizations with Alive
Discussion leads: Nikil Shyamsunder, Shihan Fang, Joseph Maheshe, Ruolin Ye, I-Ting Tsai
Beta Was this translation helpful? Give feedback.
All reactions