Memoize name_depth to speed up resolution sorting by 3x by st0012 · Pull Request #654 · Shopify/rubydex

st0012 · 2026-03-10T21:28:53Z

Summary

Pre-compute name_depth for all names into an IdentityHashMap<NameId, u32> cache before sorting, eliminating redundant recursive walks during O(n log n) comparisons
Switch sort_by to sort_unstable_by since the full (depth, uri, offset) key provides deterministic ordering without needing stability

Problem

Profiling with samply revealed that 88% of sampled resolution time was spent inside name_depth closures during prepare_units sorting. The function recursively walks parent_scope and nesting chains to compute depth, and was called from the sort comparator on every comparison — recomputing the same depths millions of times with no memoization.

Note: the depth sort is a correctness requirement, not just an optimization. Removing it entirely causes 13 test failures — the resolution loop's made_progress check gives up when children are processed before parents.

Fix

Compute depths once for all names in a single memoized pass, then use O(1) lookups in the sort comparators.

Benchmark

Metric	Before	After	Delta
Resolution	50.2s (79.4%)	16.7s (57.7%)	-66.7%
Total	63.2s	29.0s	-54.1%
Listing	0.76s	0.80s	~same
Indexing	11.5s	10.8s	~same
Querying	0.72s	0.66s	~same
Memory (RSS)	4863 MB	4833 MB	~same
Declarations	879,648	879,648	identical
Definitions	1,043,725	1,043,725	identical

Resolution went from 50.2s → 16.7s (3x speedup). Total indexing time cut in half. Output is identical (same counts, same orphan rate), confirming correctness.

st0012 · 2026-03-10T22:59:01Z

rust/rubydex/src/resolution.rs

        // When the depth is the same, sort by URI and offset to maintain determinism
-        definitions.sort_by(|(_, (name_a, uri_a, offset_a)), (_, (name_b, uri_b, offset_b))| {
-            (Self::name_depth(name_a, names), uri_a, offset_a).cmp(&(Self::name_depth(name_b, names), uri_b, offset_b))
+        definitions.sort_unstable_by(|(_, (name_a, uri_a, offset_a)), (_, (name_b, uri_b, offset_b))| {


This change is not required for the rest of the PR, but since we can't have 2 identical depth, uri, offset group, we can use unstable sort for a bit of speedup.

st0012 · 2026-03-10T22:59:11Z

rust/rubydex/src/resolution.rs

        // Sort constant references based on their name complexity so that simpler names are always first
-        const_refs.sort_by(|(_, (name_a, uri_a, offset_a)), (_, (name_b, uri_b, offset_b))| {
-            (Self::name_depth(name_a, names), uri_a, offset_a).cmp(&(Self::name_depth(name_b, names), uri_b, offset_b))
+        const_refs.sort_unstable_by(|(_, (name_a, uri_a, offset_a)), (_, (name_b, uri_b, offset_b))| {


Same as the above

## Summary - Add `[profile.profiling]` to `rust/Cargo.toml` — inherits release optimizations (LTO, opt-level 3, single codegen unit) with debug symbols enabled for readable flamegraphs - Add `.claude/skills/profiling/SKILL.md` — project-specific skill for profiling the Rubydex indexer ## What the skill covers - **samply** for interactive CPU flamegraphs (Firefox Profiler in browser) - **macOS `sample`** for text-based call trees (non-interactive/agent use) - Phase isolation with `--stop-after` and `--stats` - How to read profiles (self-time, concentration vs. spread, allocation pressure) - Memory profiling with `utils/mem-use` - Before/after comparison workflow with delta tables - Troubleshooting (permissions, missing debug symbols, run variance) ## Eval results Ran the skill in a fresh session. It successfully guided profiling end-to-end and identified the dominant bottleneck: ``` Profiling Results Summary ┌────────────┬─────────┬───────┐ │ Phase │ Time │ % │ ├────────────┼─────────┼───────┤ │ Listing │ 0.67s │ 1.3% │ ├────────────┼─────────┼───────┤ │ Indexing │ 10.0s │ 19.0% │ ├────────────┼─────────┼───────┤ │ Resolution │ 41.3s │ 78.4% │ ├────────────┼─────────┼───────┤ │ Querying │ 0.70s │ 1.3% │ ├────────────┼─────────┼───────┤ │ Total │ 52.7s │ │ ├────────────┼─────────┼───────┤ │ Memory │ 4756 MB │ │ └────────────┴─────────┴───────┘ The Bottleneck: name_depth in sorting (100% of sampled resolution time) ┌───────────────────────────────────┬──────────────┬───────┐ │ Function │ Self Samples │ % │ ├───────────────────────────────────┼──────────────┼───────┤ │ name_depth (nesting closure) │ 12,536 │ 59.6% │ ├───────────────────────────────────┼──────────────┼───────┤ │ name_depth (parent_scope closure) │ 5,984 │ 28.5% │ ├───────────────────────────────────┼──────────────┼───────┤ │ quicksort internals │ 1,447 │ 6.9% │ ├───────────────────────────────────┼──────────────┼───────┤ │ memcmp │ 884 │ 4.2% │ └───────────────────────────────────┴──────────────┴───────┘ 88% of all sampled time is spent in name_depth, called from sort_by in prepare_units. The function recursively walks parent_scope and nesting chains with zero memoization — and the sort calls it O(n log n) times per comparison. ``` This finding led directly to #654 (3x resolution speedup via memoized depth computation).

Pre-compute name depths for all names in a single pass before sorting, eliminating redundant recursive walks during O(n log n) comparisons. Previously, name_depth was called from the sort comparator for every comparison, each time recursively walking parent_scope and nesting chains to the root. With ~880K names and deep hierarchies (up to 130 levels), this was the dominant bottleneck: 88% of sampled resolution time was spent in name_depth closures. The fix computes depths once into an IdentityHashMap<NameId, u32> cache, then uses direct lookups in the sort comparators. Also switches to sort_unstable_by since the full (depth, uri, offset) key provides deterministic ordering without needing stability.

st0012 mentioned this pull request Mar 10, 2026

Add profiling skill and Cargo profiling profile #655

Merged

st0012 force-pushed the optimize-name-depth-sorting branch 3 times, most recently from 8f44afe to 9963652 Compare March 10, 2026 22:14

st0012 mentioned this pull request Mar 10, 2026

Add incremental invalidation engine #641

Open

st0012 self-assigned this Mar 10, 2026

st0012 commented Mar 10, 2026

View reviewed changes

st0012 marked this pull request as ready for review March 10, 2026 23:00

st0012 requested a review from a team as a code owner March 10, 2026 23:00

st0012 force-pushed the optimize-name-depth-sorting branch from 9963652 to 37fbd24 Compare March 11, 2026 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memoize name_depth to speed up resolution sorting by 3x#654

Memoize name_depth to speed up resolution sorting by 3x#654
st0012 wants to merge 1 commit intomainfrom
optimize-name-depth-sorting

st0012 commented Mar 10, 2026 •

edited

Loading

Uh oh!

st0012 Mar 10, 2026

Uh oh!

st0012 Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

st0012 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Benchmark

Uh oh!

st0012 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

st0012 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

st0012 commented Mar 10, 2026 •

edited

Loading