Skip to content

libstore: Use boost::regex for GC root discovery #13142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 18, 2025

Conversation

xokdvium
Copy link
Contributor

@xokdvium xokdvium commented May 6, 2025

Motivation

As it turns out using std::regex is actually the bottleneck for root discovery. Just substituting std:: -> boost:: makes root discovery twice as fast (3x if counting only userspace time).

Some rather ad-hoc measurements to motivate the switch:

(On master)

nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-cli --out-link result-1e822bd4149a8bce1da81ee2ad9404986b07914c
taskset -c 2,3 hyperfine "result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0"
Benchmark 1: result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0
  Time (mean ± σ):     481.6 ms ±   3.9 ms    [User: 336.2 ms, System: 142.0 ms]
  Range (min … max):   474.6 ms … 487.7 ms    10 runs

(After this patch)

taskset -c 2,3 hyperfine "result/bin/nix store gc --dry-run --max 0"
Benchmark 1: result/bin/nix store gc --dry-run --max 0
  Time (mean ± σ):     254.7 ms ±   9.7 ms    [User: 111.1 ms, System: 141.3 ms]
  Range (min … max):   246.5 ms … 281.3 ms    10 runs

boost::regex is a drop-in replacement for std::regex, but much faster. Doing a simple before/after comparison doesn't surface any change in behavior:

result/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l

Note

This really doesn't make a dent in the actual runtime of GC itself.

Context


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

As it turns out using `std::regex` is actually the bottleneck
for root discovery. Just substituting `std::` -> `boost::`
makes root discovery twice as fast (3x if counting only userspace time).

Some rather ad-hoc measurements to motivate the switch:

(On master)

```
nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-cli --out-link result-1e822bd4149a8bce1da81ee2ad9404986b07914c
taskset -c 2,3 hyperfine "result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0"
Benchmark 1: result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0
  Time (mean ± σ):     481.6 ms ±   3.9 ms    [User: 336.2 ms, System: 142.0 ms]
  Range (min … max):   474.6 ms … 487.7 ms    10 runs
```

(After this patch)

```
taskset -c 2,3 hyperfine "result/bin/nix store gc --dry-run --max 0"
Benchmark 1: result/bin/nix store gc --dry-run --max 0
  Time (mean ± σ):     254.7 ms ±   9.7 ms    [User: 111.1 ms, System: 141.3 ms]
  Range (min … max):   246.5 ms … 281.3 ms    10 runs
```

`boost::regex` is a drop-in replacement for `std::regex`, but much faster.
Doing a simple before/after comparison doesn't surface any change in behavior:

```
result/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
```
@xokdvium xokdvium requested a review from Ericson2314 as a code owner May 6, 2025 22:14
@edolstra
Copy link
Member

edolstra commented May 7, 2025

Please check that this doesn't cause a dependency on libicu (see #7762).

@Mic92
Copy link
Member

Mic92 commented May 13, 2025

Please check that this doesn't cause a dependency on libicu (see #7762).

@xokdvium have you checked this?

@xokdvium
Copy link
Contributor Author

Hmm, libnixstore.so already depends on libboost_regex.so.1.87.0 as well as libicu*so.76 on master's ed52176:

$ nix build github:nixos/nix/ed521760bc92a93c27b8e0a69b5e89f610cef697#nix-store --out-link result-ed521760bc92a93c27b8e0a69b5e89f610cef697
$ ldd result-ed521760bc92a93c27b8e0a69b5e89f610cef697/lib/libnixstore.so
....
        libboost_regex.so.1.87.0 => /nix/store/z4k2zqmkpb7byagshlxv1ik7iijp5b9h-boost-1.87.0/lib/libboost_regex.so.1.87.0 (0x00007f4db3e8e000)
        libicudata.so.76 => /nix/store/6pr4vfwd4s376sfa784d2ad0b82gdd2d-icu4c-76.1/lib/libicudata.so.76 (0x00007f4db202b000)
        libicui18n.so.76 => /nix/store/6pr4vfwd4s376sfa784d2ad0b82gdd2d-icu4c-76.1/lib/libicui18n.so.76 (0x00007f4db1c78000)
        libicuuc.so.76 => /nix/store/6pr4vfwd4s376sfa784d2ad0b82gdd2d-icu4c-76.1/lib/libicuuc.so.76 (0x00007f4db1a47000)
        libunistring.so.5 => /nix/store/mbx9ii53lzjlrsnlrfmzpwm33ynljwdn-libunistring-1.3/lib/libunistring.so.5 (0x00007f4db185a000)
....

I guess this is boost's auto-link at work and some boost libraries that we are already using transitively include boost::regex headers https://github.com/boostorg/regex/blob/f851a08050dd6d4bdea552d2cf614c69d8d3cff9/include/boost/regex/v5/cregex.hpp#L28-L39.

Should I also modify packaging/dependencies.nix to disable enableIcu?

xokdvium added 2 commits May 13, 2025 08:47
This reduces the closure size on master by 40MiB.

```
$ nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-store --out-link closure-on-master
$ nix build .#nix-store -L --out-link closure-without-icu
$ nix path-info --closure-size -h ./closure-on-master
/nix/store/8gwr38m5h6p7245ji9jv28a2a11w1isx-nix-store-2.29.0pre  124.4 MiB
$ nix path-info --closure-size -h ./closure-without-icu
/nix/store/k0gwfykjqpnmaqbwh23nk55lhanc9g24-nix-store-2.29.0pre   86.6 MiB
```
@xokdvium xokdvium requested a review from edolstra as a code owner May 13, 2025 08:52
@xokdvium
Copy link
Contributor Author

xokdvium commented May 13, 2025

@edolstra @Mic92 Seems like master already links to libicu somehow. #13142 (comment). I've disabled locale support in packaging/dependencies in f3090ef. Does that make sense? Should that be split off into a different PR?
AFAICT this is a closure-size regression from meson migration.

Copy link
Member

@Mic92 Mic92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

@xokdvium
Copy link
Contributor Author

@edolstra ping wrt libicu dependency on master. If you'd like I could split f3090ef into a separate PR if that closure size reduction warrants a backport.

@@ -63,6 +63,7 @@ scope: {
"--with-coroutine"
"--with-iostreams"
];
enableIcu = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could in a different PR probably use disallowedReferences to make sure we never reference to icu in nix.

@Mic92 Mic92 merged commit e7078d4 into NixOS:master May 18, 2025
12 checks passed
@Mic92
Copy link
Member

Mic92 commented May 18, 2025

@edolstra ping wrt libicu dependency on master. If you'd like I could split f3090ef into a separate PR if that closure size reduction warrants a backport.

Unless there is someone else interested, I would just backport this to 2.29.

@Mic92 Mic92 added the backport 2.29-maintenance Automatically creates a PR against the branch label May 18, 2025
@Mic92
Copy link
Member

Mic92 commented May 18, 2025

@xokdvium I wonder if this is also a candidate for builtins.match if it that much faster. I remember the gcc version of std::regex also would stackoverflow on slightly larger input.

@xokdvium
Copy link
Contributor Author

@Mic92 that's the plan. There's been several attempts at this (see #7762) and also Lix did the migration as well (lix-project/lix@447212f). Before we do that though I think we need a good test suite. One possible way to do that is to extract builtins.match calls from nixpkgs by instrumenting the match primop and produce gtest/language level regression suite in-tree from that.

@xokdvium xokdvium deleted the gc-root-boost-regex branch May 18, 2025 19:54
Mic92 added a commit that referenced this pull request May 18, 2025
…3142

libstore: Use `boost::regex` for GC root discovery (backport #13142)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.29-maintenance Automatically creates a PR against the branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants