Add support for running LLVM optimizer. #6225

zygoloid · 2025-10-15T21:26:01Z

Adds a flag --opt=<mode> that specifies what to optimize for:

--opt=none turns off the optimizer as much as possible, but still respects always_inline.
--opt=debug aims to be the equivalent of -Og / -O1, and provides optimizations that don't affect the ability to debug the program. This is the default.
--opt=size optimizes for the size of the produced program, and aims to be the equivalent of -Oz.
--opt=speed optimizes for the execution time of the produced program, and aims to be the equivalent of -O3.

Following the approach taken by Clang, the optimization level feeds into both the configuration of the LLVM pass pipeline and the attributes added to function definitions generated by the frontend.

Optimization is performed in a new phase, opt, which runs between lower and codegen.

Adds a flag `--opt=<mode>` that specifies what to optimize for: * `--opt=none` turns off the optimizer as much as possible, but still respects always_inline. * `--opt=debug` aims to be the equivalent of `-Og` / `-O1`, and provides optimizations that don't affect the ability to debug the program. This is the default. * `--opt=size` optimizes for the size of the produced program, and aims to be the equivalent of `-Oz`. * `--opt=speed` optimizes for the execution time of the produced program, and aims to be the equivalent of `-O3`. Following the approach taken by Clang, the optimization level feeds into both the configuration of the LLVM pass pipeline and the attributes added to function definitions generated by the frontend. Optimization is performed in a new phase, `opt`, which runs between `lower` and `codegen`.

danakj

LGTM overall, but will leave for Chandler to have a look too.

Some feedback on a few things below.

danakj · 2025-10-16T15:33:39Z

toolchain/driver/compile_subcommand.cpp

+      [&](auto& arg_b) {
+        arg_b.SetOneOf(
+            {
+                // We intentionally don't expose O2 and Os.


FWIW, as a data point, Chromium uses -Os over -Oz for most of its components on most build targets (and -O2 for some components objects that are more perf sensitive): https://source.chromium.org/chromium/chromium/src/+/main:build/config/compiler/BUILD.gn;l=2806;drc=b940e533f03fffbcd16aad86131dcb91abaa21c4

I think we will find that we need to expose both for existing projects to migrate to using Carbon as their C++ toolchain. Maybe as a more advanced option, like --opt=llvm:Os which lets you dig down into the specific levels of LLVM breaking past any encapsulation Carbon might provide? Cuz these are presented nicely as you've done here otherwise. Though I wonder what that means for optimizations we write in the Carbon frontend one day.

I am not as sure about -O2 vs -O3. I thought -O3 is generally avoided for being buggy so I was surprised to see it here and assume my information is outdated. But I see Chromium uses -O2 for perf sensitive components: https://source.chromium.org/chromium/chromium/src/+/main:build/config/compiler/BUILD.gn;l=2823;drc=b940e533f03fffbcd16aad86131dcb91abaa21c4

The suggestion to only support the equivalent of -Oz and -O3, and not -Os nor -O2 came from @chandlerc; I could only give second-hand rationale for this decision. (I'm personally fine with having both, assuming we can provide reasonable semantic descriptions of what they do, rather than "try -O2 and -O3 and pick whichever one is faster in practice" which I've found to be the case for Clang.)

I think there are two things we're conflating here...

C++ code built with the packaged Clang toolchain should be able to use the Clang optimization levels exactly. And even using carbon clang++ -- -Os and such should work, and we should make sure that overrides whatever we set here, at most this should establish a default for a clang invocation.

This flag is more load bearing for Carbon-specific optimization. My hope is that things like generic coalescing and other size improvements will be enough of a size win that in Carbon code we can largely use a single "aggressive" -O3.

I'd really like to let users provide a profile, even a file-granularity profile, to control between the finer granularities of -Os, -O2, and -O3 long term. We could even have annotations in the file itself to help control this, rather than having to configure a command line flag differently. Does that make sense?

However, it might be nice to leave some documentation to this effect here:

Suggested change

// We intentionally don't expose O2 and Os.

// We intentionally don't expose O2 and Os. The difference

// between these levels tends to reflect what achieves the

// best speed for a specific application, as they all

// largely optimize for speed as the primary factor.

//

// Instead of controlling this with more nuanced flags, we

// plan to support profile and in-source hints to the

// optimizer to adjust its strategy in the specific places

// where the default doesn't have the desired results.

Could also consider moving some or all of this to the documentation string instead of a comment.

Yes that makes sense and addresses my concerns.

danakj · 2025-10-16T16:01:55Z

toolchain/driver/compile_subcommand.cpp

-  if (!codegen) {
-    return false;
-  }
+  CARBON_CHECK(module_, "Must call RunLower first");


Suggested change

CARBON_CHECK(module_, "Must call RunLower first");

CARBON_CHECK(module_, "Must call PostLower first");

Maybe? Should PostLower have some bool sideeffect we could check here?

I don't know how valuable these CHECKs are at all; I only added them for consistency with the existing CHECKs. But we're not actually depending on the side-effects of PostLower here, only those of RunLower, and we're only checking that RunLower was run.

If we want something better here, perhaps we could add general tracking of the most recent phase and ensure we run them in the right order?

I would prefer that, yeah. A nice enum would be much more clear

toolchain/driver/compile_subcommand.cpp

danakj · 2025-10-16T16:06:06Z

toolchain/driver/compile_subcommand.cpp

+}
+
 auto CompilationUnit::RunCodeGen() -> void {
  CARBON_CHECK(module_, "Must call RunLower first");


Should this be PostLower?

Should PostLower have some bool sideeffect we could check here?

toolchain/driver/compile_subcommand.cpp

danakj · 2025-10-16T16:14:45Z

toolchain/lower/file_context.cpp

+      // Convert --opt=size into optsize and minsize.
+      if (context().opt_level().getSizeLevel() > 0) {
+        attr_builder.addAttribute(llvm::Attribute::OptimizeForSize);
+      }


It seems like it would be nicer to plumb through carbon's optimization level (size, speed, etc) and then work with that here, instead of trying to reverse engineer it from the llvm opt level. Would that make sense?

If not, maybe we could break this mapping back and forth into a couple helper functions so the logic is consolidated and we can maintain a clear reverse mapping?

Agree, especially as I think the OptimizeForSize code path here is dead?

Co-authored-by: Dana Jansens <[email protected]>

eyelash · 2025-10-17T09:35:22Z

Since this command line option is part of the user interface of the Carbon compiler and opt can be an abbreviation for both option/optional and optimization, would it make sense to use --optimize instead? This is also what Clang uses. I imagine that seeing --opt=none without context or familiarity with the Carbon compiler could easily be confusing, especially since None is one of the cases of Optional.

danakj · 2025-10-17T13:03:30Z

Since this command line option is part of the user interface of the Carbon compiler and opt can be an abbreviation for both option/optional and optimization, would it make sense to use --optimize instead? This is also what Clang uses. I imagine that seeing --opt=none without context or familiarity with the Carbon compiler could easily be confusing, especially since None is one of the cases of Optional.

FWIW, Rust (which has a similar Option/None quandry) uses -C opt-level

Co-authored-by: Dana Jansens <[email protected]>

chandlerc · 2025-10-23T00:10:47Z

toolchain/driver/compile_subcommand.cpp

+      [&](auto& arg_b) {
+        arg_b.SetOneOf(
+            {
+                // We intentionally don't expose O2 and Os.


I think there are two things we're conflating here...

C++ code built with the packaged Clang toolchain should be able to use the Clang optimization levels exactly. And even using carbon clang++ -- -Os and such should work, and we should make sure that overrides whatever we set here, at most this should establish a default for a clang invocation.

This flag is more load bearing for Carbon-specific optimization. My hope is that things like generic coalescing and other size improvements will be enough of a size win that in Carbon code we can largely use a single "aggressive" -O3.

I'd really like to let users provide a profile, even a file-granularity profile, to control between the finer granularities of -Os, -O2, and -O3 long term. We could even have annotations in the file itself to help control this, rather than having to configure a command line flag differently. Does that make sense?

chandlerc · 2025-10-23T00:11:47Z

toolchain/driver/compile_subcommand.cpp

+      [&](auto& arg_b) {
+        arg_b.SetOneOf(
+            {
+                // We intentionally don't expose O2 and Os.


However, it might be nice to leave some documentation to this effect here:

Suggested change

// We intentionally don't expose O2 and Os.

// We intentionally don't expose O2 and Os. The difference

// between these levels tends to reflect what achieves the

// best speed for a specific application, as they all

// largely optimize for speed as the primary factor.

//

// Instead of controlling this with more nuanced flags, we

// plan to support profile and in-source hints to the

// optimizer to adjust its strategy in the specific places

// where the default doesn't have the desired results.

Could also consider moving some or all of this to the documentation string instead of a comment.

chandlerc · 2025-10-23T00:17:40Z

toolchain/driver/compile_subcommand.cpp

+  // TODO: Should we do this earlier? Perhaps Lower should be passed the target
+  // triple so it can create the module with this already set.
+  llvm::Triple target_triple(options_->codegen_options.target);
+  module_->setTargetTriple(target_triple);


Yes, we should do this earlier IMO.

No need to make the change here unless you want, but I don't think we save anything by deferring this and it seems more clean to always have the target in the module.

chandlerc · 2025-10-23T00:20:53Z

toolchain/driver/compile_subcommand.cpp

+  // TODO: A lot of the work done here duplicates work done by Clang setting up
+  // its pass manager. Moreover, we probably want to pick up Clang's
+  // customizations and make use of its flags for controlling LLVM passes. We
+  // should consider whether we would be better off running Clang's pass
+  // pipeline rather than building one of our own, or factoring out enough of
+  // Clang's pipeline builder that we can reuse and further customize it.


FWIW, I'm skeptical we want to do too much of this...

I suspect we both want to:

a) Be a bit more opinionated about having one consistent way of running things, and take the opportunity to remove some accumulated technical debt in how Clang sets things up.

b) Allow adding Carbon-specific customization that doesn't make sense for Clang's pipeline.

Factoring this layer of Clang to share code has historically be quite difficult. The most we've factored out is into LLVM's pass builder. Maybe that's the better target for further factoring?

I think there's a real problem here that we will eventually need to address. Whatever pass pipeline we build will be used for both the IR that we generate and the IR that Clang generates, and we're allowing arbitrary flags, including optimization-control flags, to be passed to Clang in order to affect how it generates IR. For example, we can pass the -f[no-]vectorize flag to Clang while building Carbon code that imports C++, and it seems reasonable to expect that to at least affect vectorization in the C++ portion of the compilation.

Making this more complicated, the impact of the various optimization flags is split across IR generation and pass pipeline construction -- for example, a lot of the difference between -O2 and -Os is in whether the function definitions are marked optsize, rather than whether we're using the -Os pass pipeline, but both are relevant. So we don't really have the luxury of saying that Carbon's optimization flags override Clang's or vice versa -- in practice, for inline functions in C++ code used from Carbon, we will be using a fairly arbitrary mixture of the two sets of flags (some affecting the IR attributes, some affecting the pass pipeline), which seems like it's going to lead to problems if they can be misaligned. (We also currently have a weird setup where both the Clang pass pipeline and the Carbon pass pipeline will be run on C++ code used from Carbon, but I'm expecting that eventually we'll have both frontends generate IR into the same module and stop using the LLVM linker, and at that point we'll likely only run a single pass pipeline.)

For example, as concrete consequences (I've not tested these, but this is how I think these various behaviors will combine):

carbon compile --opt=speed won't optimize inline C++ code in the Carbon compile, because Clang defaults to -O0, which puts the LLVM optnone attribute on everything.

carbon compile --opt=speed --clang-arg=-O1 applies -O3 optimization to the C++ code, not -O1, because we run the Carbon pass pipeline on the C++ code, and nothing in the C++ code prevents the full -O3 optimizations from firing.

carbon compile --opt=size --clang-arg=-O1 approximately applies -O2 optimization to the C++ code, because --opt=size means -O2 plus optsize+minsize attributes on all function definitions, but those attributes aren't added to the code generated by clang.

carbon compile --opt=speed --clang-arg=-Oz will not vectorize code in C++ inline functions because -Oz applies optsize+minsize to the IR.

carbon compile --opt=speed --clang-arg=-O2 --clang-arg=-fno-vectorize will vectorize code in C++ inline functions because -fno-vectorize affects the pass pipeline not function attributes.

This mixture of pass pipeline behavior and per-function attributes mean that what we get is going to be hard to explain to a user.

It seems to me that the cleanest available way out of this is to have both frontends agree on the optimization level and the implications that that has on (eg) the function attributes that are generated and the pass pipeline that's used. And I think the simplest way to get there would be to ask Clang to build the pass pipeline so that it can make consistent use of its flags for controlling whether various kinds of optimizations fire or not. In terms of user experience, that'd mean that both Carbon flags and Clang flags can be used to control optimization, and either way, they affect optimization settings for both Carbon and Clang code in that compilation.

But maybe there's another approach that will result in behavior that we can reasonably explain to the user? For example, if we could use the Clang pass pipeline for the C++ code and the Carbon pass pipeline for the Carbon code, that might be reasonable, but I don't think that's really possible without some major changes to LLVM. I suppose we could map the --opt= flag into a corresponding --clang-arg, and tell people that if they add clang-specific optimization flags, we don't provide any guarantees about what happens, but that seems a little unsatisfying.

We chatted a bunch about this in person.... Trying to capture that somewhat here.

Basically, yes, this is messy. I understand it feeling cleanest is to cause Clang to build the pass pipeline we want, and then use that. But I'd like to try to get to a place where we have a bit more control over the optimizations on the Carbon side.

For example, it'd be nice if we could do Carbon-specific optimization passes or other customization without needing to find a way to plumb that through Clang's pipeline creation.

Sadly, I don't see a good way to have separate pipelines for the C++ and Carbon side -- the interop boundary is going to be a really sharp edge that messes lots of things up. We discussed even acting like a mini-LTO with both per-merge and post-merge pipelines, but that will result in very different optimizations due to phases, have a big compile time hit, and not even fix many of the annoyances because so many of the pipeline-only facets occur at the end of the pipeline where it would be shared anyways.

The history is that we've been trying to move more and more of the interesting cases here to be controlled by attributes and other things in the IR itself that can meaningfully merge. This is necessary to get good behavior from eventual LTO already. But there may be some things that don't make sense in that model, and even when they do make sense, not everything has moved.

To also summarize the forward direction we discussed:

I think we want Carbon's optimization flags to determine any whole-pipeline settings, not Clang's.

I think we should intercept the LLVM IR before Clang runs its pipeline by setting CodeGenOpts.DisableLLVMPasses on the Clang IR generation. This still lets Clang do any optimized emission of LLVM IR and put optnone or optsize or whatever other attribute-based Clang-level flags are passed.

We should synthesize defaults for Clang from Carbon's flags to avoid surprisingly getting Clang -O0 behavior by default.

But we should allow user Clang flags to override this, caveat emptor. The Clang flags which are modeled using mergeable per-function attributes will work and govern the inline C++ code. The flags that currently only impact the translation unit behavior will be ignored because this is fundamentally a Carbon translation unit.

I think this is roughly what we discussed, but let me know if I misremembered anything.

chandlerc · 2025-10-23T00:26:41Z

toolchain/lower/file_context.cpp

+      // Convert --opt=size into optsize and minsize.
+      if (context().opt_level().getSizeLevel() > 0) {
+        attr_builder.addAttribute(llvm::Attribute::OptimizeForSize);
+      }


Agree, especially as I think the OptimizeForSize code path here is dead?

chandlerc · 2025-10-23T00:27:27Z

toolchain/driver/compile_subcommand.cpp

+                arg_b.OneOfValue("none", llvm::OptimizationLevel::O0),
+                arg_b.OneOfValue("debug", llvm::OptimizationLevel::O1),
+                arg_b.OneOfValue("speed", llvm::OptimizationLevel::O3),
+                arg_b.OneOfValue("size", llvm::OptimizationLevel::Oz),


Given that LLVM calls this min_size, what do you think about that name? I could go either way.

It seems a little inconsistent to me to use min_size and speed -- either min_size and max_speed or size and speed seem more consistent -- and the min_ seems somewhat unnecessary to clarify (we're not going to optimize to maximize size...). So I lean towards just --opt=size. But definitely not a strong opinion, I'm pretty much happy with whatever here.

K, let's keep it size for now? Not that hard to revisit this if it ends up confusing folks.

chandlerc · 2025-10-23T00:50:49Z

Since this command line option is part of the user interface of the Carbon compiler and opt can be an abbreviation for both option/optional and optimization, would it make sense to use --optimize instead? This is also what Clang uses. I imagine that seeing --opt=none without context or familiarity with the Carbon compiler could easily be confusing, especially since None is one of the cases of Optional.

FWIW, I'd be fine changing the flag to either --optimize or --opt-level. No strong opinion there. Same for the phase name.

zygoloid requested a review from a team as a code owner October 15, 2025 21:26

zygoloid requested review from danakj and removed request for a team October 15, 2025 21:26

github-actions bot added the toolchain label Oct 15, 2025

zygoloid requested a review from chandlerc October 15, 2025 21:26

Make sure the path to the core package doesn't show up in the new tests.

63a3907

danakj reviewed Oct 16, 2025

View reviewed changes

zygoloid and others added 2 commits October 16, 2025 10:47

Apply suggestion from @danakj

e24d6da

Co-authored-by: Dana Jansens <[email protected]>

Apply suggestion from @danakj

509a1f5

Co-authored-by: Dana Jansens <[email protected]>

Apply suggestion from @danakj

4ef7369

Co-authored-by: Dana Jansens <[email protected]>

chandlerc reviewed Oct 23, 2025

View reviewed changes

-                // We intentionally don't expose O2 and Os.
+                // We intentionally don't expose O2 and Os. The difference
+                // between these levels tends to reflect what achieves the
+                // best speed for a specific application, as they all
+                // largely optimize for speed as the primary factor.
+                //
+                // Instead of controlling this with more nuanced flags, we
+                // plan to support profile and in-source hints to the
+                // optimizer to adjust its strategy in the specific places
+                // where the default doesn't have the desired results.

	CARBON_CHECK(module_, "Must call RunLower first");
	CARBON_CHECK(module_, "Must call PostLower first");

Add support for running LLVM optimizer. #6225

Are you sure you want to change the base?

Add support for running LLVM optimizer. #6225

Conversation

zygoloid commented Oct 15, 2025

Uh oh!

danakj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eyelash commented Oct 17, 2025

Uh oh!

danakj commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chandlerc commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danakj commented Oct 17, 2025 •

edited

Loading