implement `carryless_mul` by folkertdev · Pull Request #152132 · rust-lang/rust

folkertdev · 2026-02-04T18:47:24Z

tracking issue: #152080
ACP: rust-lang/libs-team#738

This defers to LLVM's llvm.clmul when available, and otherwise falls back to a method from the polyval crate (link).

Some things are missing, which I think we can defer:

the ACP has some discussion about additional methods, but I'm not sure exactly what is wanted or how to implement it efficiently
the SIMD intrinsic is not yet const (I think I ran into a bootstrapping issue). That is fine for now, I think in stdarch we can't really use this intrinsic at the moment, we'd only want the scalar version to replace some riscv intrinsics.
the SIMD intrinsic is not implemented for the gcc and cranelift backends. That should be reasonably straightforward once we have a const eval implementation though.

rustbot · 2026-02-04T18:47:27Z

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

⚠️ #[miri::intrinsic_fallback_is_spec] must only be used if the function actively checks for all UB cases,
and explores the possible non-determinism of the intrinsic.

cc @rust-lang/miri

Some changes occurred to the platform-builtins intrinsics. Make sure the
LLVM backend as well as portable-simd gets adapted for the changes.

cc @antoyo, @GuillaumeGomez, @bjorn3, @calebzulawski, @programmerjake

rustbot · 2026-02-04T18:47:30Z

r? @scottmcm

rustbot has assigned @scottmcm.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

programmerjake · 2026-02-04T20:14:09Z

library/core/src/intrinsics/fallback.rs

+                use crate::num::Wrapping;
+
+                // i.e. 0b100010001...0001 in binary.
+                const MASK: u128 = 0x1111_1111_1111_1111_1111_1111_1111_1111;


Suggested change

const MASK: u128 = 0x1111_1111_1111_1111_1111_1111_1111_1111;

const MASK: u128 = !0 / 0xF;

seems easier to read

Is it? When I look at that expression I'd have to trust the comment that it does what it says. The 0x1111 literal is long but to me it's clearer.

okaneco · 2026-02-04T22:49:56Z

the ACP has some discussion about additional methods, but I'm not sure exactly what is wanted or how to implement it efficiently

My understanding was that widening_carryless_mul was accepted. It produces a result with the low and high order bits in the next widest integer, with a signature of fn(u<n>, u<n>) -> u<2*n> based on this comment rust-lang/libs-team#738 (comment):

...we would like the widening method to return a larger integer size and not be available on u128.

The tuple returning method is a possible future expansion, not intended to be currently implemented.

I don't know the best way to implement the widening operation, but we can get the high-order bits using clmul as a primitive. In LLVM's arbitrary precision integer library, clmulr (reversed) and clmulh (high-order bits) are defined in terms of clmul like this.

clmulr(a, b) = bitreverse(clmul(bitreverse(a), bitreverse(b)))
clmulh(a, b) = clmulr(a, b) >> 1

https://github.com/llvm/llvm-project/blob/66ba9c9475c7628893f387f1b9d9d3056b05d995/llvm/include/llvm/ADT/APInt.h#L2457-L2476
https://github.com/llvm/llvm-project/blob/66ba9c9475c7628893f387f1b9d9d3056b05d995/llvm/lib/Support/APInt.cpp#L3202-L3220

Folds exist for clmulr/clmulh ISD nodes
clmulh
clmulr

folkertdev · 2026-02-04T23:35:23Z

Right, it's just the first widening function that would have that signature. If that is the plan then it's pretty simple: you just widen first and then do clmul on the widened type. That can never overflow.

For carrying_carryless_mul (also requested), there presumably is some way to get that carry bit more efficiently than performing the wide clmul. Like, not doing that is the whole point of that function.

This is running into CI sometimes using earlier LLVM releases without the new intrinsic. I asked in #t-compiler/llvm > Adding an LLVM 22 intrinsic how/if we can work around that.

programmerjake · 2026-02-05T00:52:34Z

For carrying_carryless_mul (also requested), there presumably is some way to get that carry bit more efficiently than performing the wide clmul. Like, not doing that is the whole point of that function.

actually I expect you can probably just use clmul on LLVM's i256 type for u128 and then extract the high and low halves. That's what carrying_mul does:
https://rust.godbolt.org/z/K5578vEnn

; cleaned up slightly
define void @u128_carrying_mul(ptr sret([32 x i8]) align 16 %_0, i128 noundef %a, i128 noundef %b, i128 noundef %c) unnamed_addr #0 {
start:
  %0 = zext i128 %a to i256
  %1 = zext i128 %b to i256
  %2 = zext i128 %c to i256
  %3 = mul nuw i256 %1, %0
  %4 = add nuw i256 %3, %2
  %5 = trunc i256 %4 to i128
  %6 = lshr i256 %4, 128
  %7 = trunc nuw i256 %6 to i128
  store i128 %5, ptr %_0, align 16
  %_0.repack1 = getelementptr inbounds nuw i8, ptr %_0, i64 16
  store i128 %7, ptr %_0.repack1, align 16
  ret void
}

quaternic · 2026-02-05T00:52:40Z

library/core/src/intrinsics/fallback.rs

+        /// This approach uses a bitmask of the form `0b100010001...0001` to avoid carry spilling.
+        /// When carries do occur, they wind up in a "hole" of zeros and are subsequently masked
+        /// out of the result.


This approach with 4-bit digits works up to integers with 4 * 15 = 60 bits. Past that, one digit can overflow to the next.

For u64, it does actually work for this "non-widening" operation, since the top digit may be computed as 16, but there is no next digit that would be affected. The wide result would be erroneous however. E.g. x.carryless_mul(x) with x = MASK as u64 as u128.

The impl for u128::carryless_mul is currently incorrect for that reason. You could probably extend the approach to use 5-bit digits, but it's likely better to just implement it in terms of u64::carryless_mul.

some tests against a naive impl: playground

Can we add a const { assert!(std::mem::size_of::<Self>() <= 8); } here or so to make it less likely we accidentally merge this incorrectly in the future?

RalfJung · 2026-02-05T08:21:45Z

library/core/src/intrinsics/mod.rs

    unsafe { a.unchecked_funnel_shr(b, shift) }
 }

+/// Carryless multiply.


Is this meant to be a self-contained description? To me these words mean nothing.^^

compiler/rustc_codegen_llvm/src/intrinsic.rs

RalfJung · 2026-02-05T08:25:54Z

library/core/src/num/uint_macros.rs

+        #[doc = concat!("let a = ", $clmul_lhs, stringify!($SelfT), ";")]
+        #[doc = concat!("let b = ", $clmul_rhs, stringify!($SelfT), ";")]
+        ///
+        #[doc = concat!("assert_eq!(a.carryless_mul(b), ", $clmul_result, ");")]


The only example here uses huge numbers which means the example is entirely useless if one doesn't already know what this operation does. I for one still have no clue after reading all this.^^

Which carry is discarded? All the ones that show up when adding up the results of the first steps of long multiplication? Or also all the carries that occur within the long multiplication steps? This needs way more detail. Doesn't that mean the result depends on the base? No base is mentioed in the first 2 paragraphs, and even in the third paragraph it's only mentioned in an obscure way.

maybe describe it as equivalent to:

impl uN { pub fn carryless_mul(self, other: Self) -> Self { let mut retval = 0; for i in 0..Self::BITS { if (other >> i) & 1 != 0 { retval ^= self << i; } } retval } }

That is a good specification. I haven't really found/been able to come up with a good intuitive explanation, and even if you understand the mechanics, how/when to apply this function is still hard to see.

The only example here uses huge numbers

So in part that is to test that the implementation doesn't just mask off the upper bits. Because of how these functions are generated with a macro, getting examples to work with type inference and without overflowing literals is a bit finicky.

I'll try again to come up with something more insightful.

Regular long multiplication would be using += instead of ^=, right?

I have no idea why that would be called "carryless", but that does seem like a reasonable way to explain this, yeah. And maybe we should also do the const-eval/Miri implementation that way to avoid relying on the fallback impl that seems to want to be clever. We generally don't want to be clever for intrinsics in Miri.

Regular long multiplication would be using += instead of ^=, right?

Exactly.

It is carryless in the sense that the sum of two bits (x,y) produces a carry bit = x & y and the wrapped sum = x ^ y, so using ^ instead of + is just throwing out all the carries of bitwise addition. That is, consider the equivalence x + y = (x ^ y) + ((x & y) << 1).

The connection to polynomial multiplication is that there you naturally don't carry between the coefficients: (x + 1)(x + 1) = x^2 + 2x + 1, and the 2x doesn't just turn into x^2. Rather, if you're computing the coefficients modulo 2, you get 2x = 0x. Indeed, 0b11.carryless_mul(0b11) == 0b101, where the polynomials are represented by their valuation at x = 2.

programmerjake · 2026-02-05T14:55:45Z

maybe look at the latest C++ proposal for a bunch of good examples: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3642r3.html

library/core/src/intrinsics/fallback.rs

…rries)

rustbot · 2026-02-06T00:38:47Z

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

rustbot · 2026-02-06T00:38:48Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

programmerjake · 2026-02-06T04:09:07Z

library/core/src/intrinsics/fallback.rs

+    fn carryless_mul(self, rhs: Self) -> Self {
+        // For u128 the 0b100010001...0001 trick above does not work, so we use an implementation
+        // that uses 64-bit carryless multiplication.
+        karatsuba_u128(self, rhs).1


you should be able to have a much more efficient implementation if you also have a carryless_mul_high (which could also be a useful intrinsic itself):
fixed version in #152132 (comment)

original broken version

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024&gist=5a3ff332b69359205c9b7e987f4cff12

(edit: the below carryless_mul_high has the overflow bug: https://github.com/rust-lang/rust/pull/152132/changes#r2766586213)

impl const CarrylessMul for u128 { #[inline] fn carryless_mul(self, rhs: Self) -> Self { let l = u64::carryless_mul(self as u64, rhs as u64); let lh = u64::carryless_mul(self as u64, (rhs >> 64) as u64); let hl = u64::carryless_mul((self >> 64) as u64, rhs as u64); let h = lh ^ hl ^ carryless_mul_high(self as u64, rhs as u64); ((h as u128) << 64) | l as u128 } } const fn carryless_mul_high(x: u64, y: u64) -> u64 { const fn mul_h(x: u64, y: u64) -> u64 { x.carrying_mul(y, 0).1 } // i.e. 0b100010001...0001 in binary. const MASK: u64 = 0x1111_1111_1111_1111u64; const M0: u64 = MASK; const M1: u64 = M0 << 1; const M2: u64 = M1 << 1; const M3: u64 = M2 << 1; let x0 = x & M0; let x1 = x & M1; let x2 = x & M2; let x3 = x & M3; let y0 = y & M0; let y1 = y & M1; let y2 = y & M2; let y3 = y & M3; let z0 = mul_h(x0, y0) ^ mul_h(x1, y3) ^ mul_h(x2, y2) ^ mul_h(x3, y1); let z1 = mul_h(x0, y1) ^ mul_h(x1, y0) ^ mul_h(x2, y3) ^ mul_h(x3, y2); let z2 = mul_h(x0, y2) ^ mul_h(x1, y1) ^ mul_h(x2, y0) ^ mul_h(x3, y3); let z3 = mul_h(x0, y3) ^ mul_h(x1, y2) ^ mul_h(x2, y1) ^ mul_h(x3, y0); (z0 & M0) | (z1 & M1) | (z2 & M2) | (z3 & M3) }

I think this won't quite work for u64 (or larger) because of the reasons in #152132 (comment)
Specifically, if in some mul_h(xi, yj) both inputs have 16 bits set (so xi == Mi and yj == Mj), then the polynomial product will have a coefficient of 16 at index 64 + i + j, which will carry a one to the coefficient at 68 + i + j. There might be some trick to make it work, but that sounds complex. I'd first consider implementing the widening version u32 X u32 -> u64, and then extending that to larger sizes with karatsuba.

edit: Oh, you had already noticed, and GH just doesn't automatically reload threads.

Anyway, I do agree that having either carryless_mul_high or widening_carryless_mul or some such at least internally is a good idea, since that is then easily extended to larger sizes.

ok, here's a fixed version that incorporates a check for if the multiply would overflow the 4-bit parts:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024&gist=e24ec0122afcca4b8b05239aa7f7c3c4

impl const CarrylessMul for u128 { #[inline] fn carryless_mul(self, rhs: Self) -> Self { let l = u64::carryless_mul(self as u64, rhs as u64); let lh = u64::carryless_mul(self as u64, (rhs >> 64) as u64); let hl = u64::carryless_mul((self >> 64) as u64, rhs as u64); let h = lh ^ hl ^ carryless_mul_high(self as u64, rhs as u64); ((h as u128) << 64) | l as u128 } } const fn carryless_mul_high(x: u64, y: u64) -> u64 { // i.e. 0b100010001...0001 in binary. const MASK: u64 = 0x1111_1111_1111_1111u64; const M0: u64 = MASK; const M1: u64 = M0 << 1; const M2: u64 = M1 << 1; const M3: u64 = M2 << 1; macro_rules! mul { ($x_mask_shift:literal, $y_mask_shift:literal) => {{ let x = x & (MASK << $x_mask_shift); let y = y & (MASK << $y_mask_shift); if x == MASK << $x_mask_shift && y == MASK << $y_mask_shift { // only case where the multiply overflows the 4-bit parts 0x0101_0101_0101_0101u64 << ($x_mask_shift + $y_mask_shift) } else { x.carrying_mul(y, 0).1 } }}; } let z0 = mul!(0, 0) ^ mul!(1, 3) ^ mul!(2, 2) ^ mul!(3, 1); let z1 = mul!(0, 1) ^ mul!(1, 0) ^ mul!(2, 3) ^ mul!(3, 2); let z2 = mul!(0, 2) ^ mul!(1, 1) ^ mul!(2, 0) ^ mul!(3, 3); let z3 = mul!(0, 3) ^ mul!(1, 2) ^ mul!(2, 1) ^ mul!(3, 0); (z0 & M0) | (z1 & M1) | (z2 & M2) | (z3 & M3) }

Huh, that should indeed work. I do wonder how much the codegen suffers from the extra checks, but I suppose that's something to examine with actual benchmarks.

if the ifs are replaced with select_unpredictable, it seems pretty reasonable: https://rust.godbolt.org/z/9q9oTf9T3

carrying_carryless_mul is for when you want a bigint carryless-mul:

fn carryless_mul_u1024_by_u64(a: &mut [u64; 16], b: u64) { let mut carry = 0; for a in a { (*a, carry) = a.carrying_carryless_mul(b, carry); } }

it's just like:

fn mul_u1024_by_u64(a: &mut [u64; 16], b: u64) { let mut carry = 0; for a in a { (*a, carry) = a.carrying_mul(b, carry); } }

ah, the carry is just a convenience really. Makes sense, I added an implementation now. u128 needs some special care, and currently LLVM will not optimize it to use a 128-bit or 256-bit clmul. So we might still want the -> (Self, Self) standard widening variant as an intrinsic.

It is indeed just convenience, unlike normal multiplication where it's useful for handling the possible carry from the low bits to the high bits. With carryless_mul, you can just ^ the carry to the low bits of the result.

The -> (uN, uN) signature can be convenient at times, and so can -> u2N. An alternative convenience API might be something to split an integer to its low and high halves. E.g. in libm we have u2N::lo_hi(self) -> (uN, uN) as an internal utility. With that, the above would be written

fn carryless_mul_u1024_by_u64(a: &mut [u64; 16], b: u64) { let mut carry = 0; for a in a { let (lo, hi) = a.widening_carryless_mul(b).lo_hi(); (*a, carry) = (lo ^ carry, hi); } }

The name carrying_carryless_mul is just so nonsensical I'd prefer not having it. Thing is, the carrying in carrying_{mul,add} is there because the operation takes a carry in and carries the sum to the high bits to produce a carry out, and the operation is useful because the latter depends on the former. That's not the case with *_carryless_mul.

while I agree that carrying_carryless_mul is a bad name, I think we do need some method on uN that returns the high half (and the low half too if desired) of the product as uN. widening_carryless_mul is insufficient since it isn't implemented for u128 since Rust has no u256 type.

in the hypothetical future where Rust gains generic integer types so there's a u<N> for every non-negative integer N, I think we still will want a method that returns the same u<N> rather than depending on const generic expressions somehow being made to work to return u<{ N * 2 }>.

Yes, that's a good point. Personally I'd be fine with widening_{mul, carryless_mul} that return (uN, uN).

Another might be to make the return type something richer than just a tuple; something like struct WideInt<H, L> { low: L, high: H } could be generally useful (i128 ~= WideInt<i64, u64>, Duration ~= WideInt<u64, Nanos>). Might be too weird for an inherent method on a primitive integer but 🤷

quaternic · 2026-02-06T04:59:26Z

library/core/src/intrinsics/fallback.rs

+}
+
+#[rustc_const_unstable(feature = "core_intrinsics_fallbacks", issue = "none")]
+const fn karatsuba_u128(h: u128, y: u128) -> (u128, u128) {


Firstly, this is returning (high, low), whereas e.g. uN::carrying_mul uses (low, high), which I find preferable. Either way, a comment noting that wouldn't hurt.

Secondly, I think this could be improved by outlining the definition of the widening clmul for the half-sized integer. Let's say

const fn widening_clmul_u64(x: u64, y: u64) -> (u64, u64) { let lo = x.carryless_mul(y); let xr = x.reverse_bits(); let yr = y.reverse_bits(); let hi = xr.carryless_mul(yr).reverse_bits() >> 1; (lo, hi) }

Finally, computing the high bits of CLMUL this way may be quite slow on targets without a fast bit-reversal.

Finally, computing the high bits of CLMUL this way may be quite slow on targets without a fast bit-reversal.

FWIW BearSSL also implemented its own SWAR-style bit reversal:

https://github.com/unkaktus/bearssl/blob/6a1dab03beac9a324fdaed36ffdeb5081ac64b9a/src/hash/ghash_ctmul64.c#L58-L74

(See also rev32)

Finally, computing the high bits of CLMUL this way may be quite slow on targets without a fast bit-reversal.

There isn't that much we can do about that though, right? We already let LLVM decide the codgen in most cases, for miri/const evaluation we're using the simpler implementation, so it's really just gcc and cranelift where this implementation would even be used.

I meant that it might be better to avoid reversing bits at all. I was thinking that the reversing could be significant extra work, though it might actually be quite minor relative to the clmuls.

The current impl for u128 using Karatsuba in this PR does 6 u64::carryless_mul and 7 u64::reverse_bits, but it could be written with 3 uses of u64::widening_carryless_mul instead.

Hence the suggestion to implement widening_carryless_mul directly for some size where it makes the most sense (probably either u32 or u64), and use Karatsuba to extend that to wider sizes.

fn widening_carryless_mul_u64_karatsuba(x: u64, y: u64) -> u128 { let x0 = x as u32; let x1 = (x >> 32) as u32; let y0 = y as u32; let y1 = (y >> 32) as u32; let z0 = widening_carryless_mul_u32(x0, y0); let z2 = widening_carryless_mul_u32(x1, y1); // The grade school algorithm would compute: // z1 = x0y1 ^ x1y0 // Instead, Karatsuba first computes: let z3 = widening_carryless_mul_u32(x0 ^ x1, y0 ^ y1); // Since it distributes over XOR, // z3 == x0y0 ^ x0y1 ^ x1y0 ^ x1y1 // |--| |---------| |--| // == z0 ^ z1 ^ z2 // so we can compute z1 as let z1 = z3 ^ z0 ^ z2; (z0 as u128) ^ ((z1 as u128) << 32) ^ ((z2 as u128) << 64) }

Neat, I've updated u128::carrying_carryless_mul to use this idea.

folkertdev · 2026-02-06T15:51:48Z

I added widening_carryless_mul now for the types that makes sense for.

Mark-Simulacrum · 2026-02-08T19:25:12Z

library/core/src/intrinsics/fallback.rs

+        /// This approach uses a bitmask of the form `0b100010001...0001` to avoid carry spilling.
+        /// When carries do occur, they wind up in a "hole" of zeros and are subsequently masked
+        /// out of the result.


Can we add a const { assert!(std::mem::size_of::<Self>() <= 8); } here or so to make it less likely we accidentally merge this incorrectly in the future?

Mark-Simulacrum · 2026-02-08T19:35:35Z

library/core/src/intrinsics/fallback.rs

+                let z2 = (x0 * y2) ^ (x1 * y1) ^ (x2 * y0) ^ (x3 * y3);
+                let z3 = (x0 * y3) ^ (x1 * y2) ^ (x2 * y1) ^ (x3 * y0);
+
+                (z0.0 & M0) | (z1.0 & M1) | (z2.0 & M2) | (z3.0 & M3)


I see that Wikipedia and the relevant C++ paper (https://isocpp.org/files/papers/P3642R3.html) have much simpler implementations. We also provide a simpler implementation as the 'simple implementation' in the description of the public function (I think the same one C++ paper provides). I think this manual unrolling may help performance, but do we have evidence of that? Does that performance matter since in practice distros ~only ship stable toolchains and Rust will not stabilize this on LLVM < 22? Maybe we should use the simple implementation as fallback too?

If I'm reading the implementation from that C++ paper correctly, it's doing one integer multiply for every bit of the input, whereas this does 16 multiply ops. For a u64, that should be 1/4 the multiplies. It's effectively batching up the work so it can do more with each multiply.

The C++ example really seems more like a naive/idealized algorithm description designed to lower to optimized intrinsics as opposed to something you'd actually want to practically deploy as a portable implementation.

Note: the original LLVM RFC originally proposed using the above method "If the CPU does not have a dedication clmul operation, it can be lowered to regular multiplication, by using holes to avoid carrys" but it sounds like the actual portable codegen out of LLVM isn't using it (yet)?

it's doing one integer multiply for every bit of the input

Technically yes, but note that each of those multiplies is of the form x * (1 << i), which is just x << i (where i is constant if unrolled), so the comparison isn't so clear.

Right, I understand that this is batching that work. But given that this code is unreachable except on gcc + cranelift in Rust-distributed builds, it's not obvious to me how much that matters. I'd also maybe expect that llvm can optimize the naive form a bit closer to this? Not sure there.

it's reachable on builds with older llvm

Right, but it seems unlikely this will stabilize before those are largely phased out of our support? In any case, I think the main thing is adding more thorough test coverage.

Do we care enough about older LLVM to add non-trivial fast-paths for those versions?

Mark-Simulacrum · 2026-02-08T19:36:12Z

library/core/src/intrinsics/mod.rs

+#[unstable(feature = "uint_carryless_mul", issue = "152080")]
+pub const fn carryless_mul<T: [const] fallback::CarrylessMul>(a: T, b: T) -> T {
+    // NOTE: while this implementation could serve as the specification, rustc_const_eval
+    // actually implements a simpler but less efficient variant as the specification.


Why? As I ask above, maybe our fallback could be the simple spec?

library/core/src/intrinsics/simd.rs

Mark-Simulacrum · 2026-02-08T19:40:06Z

library/coretests/tests/num/uint_macros.rs

+                assert_eq_const_safe!($T: <$T>::carryless_mul(0, 0), 0);
+                assert_eq_const_safe!($T: <$T>::carryless_mul(1, 1), 1);
+
+                assert_eq_const_safe!($T: <$T>::carryless_mul(0b0100, 2), 0b1000);


This testing feels too minimal, especially for the relatively complex implementations we have. Is there a reason we can't have the trivial fallback implementation and exhaustively test at least the u16/u16 case? And maybe have a random test against a reasonably-large subset of the other integer sizes?

I didn't add that here because you'd get errors on overflowing literals when using larger inputs. Is there some place that the random tests could go?

(we do also test via the doc tests. That's not really what they're for, but they do have a mechanism to pick an input per-type so you don't run into overflowing literals)

probably in https://github.com/rust-lang/rust/tree/main/library/coretests/tests/num

RalfJung

Some comments for the const-eval part, which otherwise looks good.

View changes since this review

RalfJung · 2026-02-09T10:02:14Z

compiler/rustc_const_eval/src/interpret/intrinsics.rs

+                // perform carry-less multiplication.
+                //
+                // this operation is like long multiplication, but ignores the carries.
+                // that idea corresponds to the xor operator, which is used in the implementation.
+                //
+                // wikipedia has an example https://en.wikipedia.org/wiki/carry-less_product#example


Please use proper spelling and capitalization.

RalfJung · 2026-02-09T10:02:46Z

compiler/rustc_const_eval/src/interpret/intrinsics.rs

+                for i in 0..size.bits() {
+                    // if the i-th bit in right is set
+                    if (right >> i) & 1 != 0 {
+                        // xor result with `left` shifted to the left by i positions
+                        result ^= left << i;
+                    }
+                }


Please explain why it is okay to run this algorithm on u128 no matter the actual type. In particular, couldn't the underlying type be signed...?

RalfJung · 2026-02-09T10:03:08Z

compiler/rustc_const_eval/src/interpret/intrinsics.rs

+                // Only return the lower bits.
+                result &= u128::MAX >> (128 - size.bits());


Please use https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.ScalarInt.html#method.truncate_from_uint to get a ScalarInt with implicit truncation which you can then turn into a Scalar.

rustbot assigned scottmcm Feb 4, 2026

This comment has been minimized.

Sign in to view

programmerjake reviewed Feb 4, 2026

View reviewed changes

folkertdev force-pushed the carryless-mul branch from f564229 to 47da3e1 Compare February 4, 2026 20:24

This comment has been minimized.

Sign in to view

okaneco mentioned this pull request Feb 4, 2026

Tracking Issue for uint_carryless_mul #152080

Open

4 tasks

quaternic reviewed Feb 5, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

rustbot assigned Mark-Simulacrum and unassigned scottmcm Feb 5, 2026

RalfJung reviewed Feb 5, 2026

View reviewed changes

RalfJung mentioned this pull request Feb 5, 2026

rustdoc-gui tests sporadically failing #93784

Open

tarcieri reviewed Feb 5, 2026

View reviewed changes

library/core/src/intrinsics/fallback.rs Outdated Show resolved Hide resolved

folkertdev added 4 commits February 6, 2026 00:01

implement carryless_mul

7d30e87

use the simple, inefficient implementation in rustc_const_eval

b11c3d5

add a custom fallback for u128 (the generic version would run into ca…

fc1448f

…rries)

use mask constants in the generic impl

3cef0de

folkertdev force-pushed the carryless-mul branch from 47da3e1 to ced05ab Compare February 6, 2026 00:38

This comment has been minimized.

Sign in to view

folkertdev added 3 commits February 6, 2026 01:51

update docs

2220317

use the fallback with earlier versions of LLVM

ecf4d3f

add carryless_mul to replaced_intrinsics

16a0add

folkertdev force-pushed the carryless-mul branch from ced05ab to 16a0add Compare February 6, 2026 00:51

This comment has been minimized.

Sign in to view

programmerjake reviewed Feb 6, 2026

View reviewed changes

quaternic reviewed Feb 6, 2026

View reviewed changes

folkertdev added 2 commits February 6, 2026 16:43

add improved u128 implementation

fbc7f99

add widening_carryless_mul

7404f05

folkertdev added 2 commits February 6, 2026 21:33

add carrying_carryless_mul

0e2315c

improve 128-bit carrying_carryless_mul

59d6fa7

Mark-Simulacrum reviewed Feb 8, 2026

View reviewed changes

Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 8, 2026

RalfJung reviewed Feb 9, 2026

View reviewed changes

	const MASK: u128 = 0x1111_1111_1111_1111_1111_1111_1111_1111;
	const MASK: u128 = !0 / 0xF;

		// Only return the lower bits.
		result &= u128::MAX >> (128 - size.bits());

Uh oh!

Conversation

folkertdev commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Feb 4, 2026

Uh oh!

rustbot commented Feb 4, 2026

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

okaneco commented Feb 4, 2026

Uh oh!

folkertdev commented Feb 4, 2026

Uh oh!

programmerjake commented Feb 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quaternic Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

programmerjake commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rustbot commented Feb 6, 2026

Uh oh!

rustbot commented Feb 6, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

programmerjake Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quaternic Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

programmerjake Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quaternic Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

folkertdev commented Feb 4, 2026 •

edited

Loading

quaternic Feb 5, 2026 •

edited

Loading

programmerjake commented Feb 5, 2026 •

edited

Loading

programmerjake Feb 6, 2026 •

edited

Loading

quaternic Feb 6, 2026 •

edited

Loading

programmerjake Feb 6, 2026 •

edited

Loading

quaternic Feb 6, 2026 •

edited

Loading

tarcieri Feb 8, 2026 •

edited

Loading