Description
I've been looking at ways to reduce RA memory use, because for my usecase it's regularly using >80GB! It has to dip into swap and performance tanks.
Right now, rust analyzer always uses GreenNodeBuilder::default()
, which creates a new NodeCache
for every file/text section.
I ran into issues implementing it the "right" way because I couldn't figure out how to get a NodeCache
into the database methods (such as base-db:lib.rs:parse
).
So, to run some quick benchmarks just to see if it's worth it, I implemented cache sharing the "wrong" way (using a global static mutex).
This gave a modest performance increase, and a substantial reduction in peak memory.
I also tried:
- interning green nodes when they are "mutated" with methods such as
splice_children
. This seems promising for continuous running. Cache hitrate for mutations was 45-60% depending on the repo, and didn't cause noticable perf degradation. - Disabling or tweaking
>3 children
NodeCache heuristic. Disabling gave another big win in peak memory, but cost a lot of performance, tweaking is better.
Running rust-analyzer analysis-stats on itself in release mode:
Metric | Baseline | Global Cache | Global Cache Without Heuristic | Global Cache With Heuristic 4 |
---|---|---|---|---|
Peak Memory | 4675.54MB | 4083.73MB (-12.7%) | 3739.20MB (-20.0%) | 3930.96MB (-16%) |
Idle Memory (post LRU clear) | 2908MB | 2869MB (-1.3%) | 2863MB (-1.5%) | 2867MB (-1.4%) |
NodeCache::node hitrate | 44.27% | 72.51% | 83.60% | 76.62% |
NodeCache::token hitrate | 87.40% | 98.38% | 98.38% | 98.38% |
Runtime | 87.37s | 82.05s (-6.1%) | 91.93s (+5.2%) | 81.60s (-6.6%) |
Because duplication rate grows with codebase size, the memory reduction should grow superlinearly.
I chose a larger repo, but not so large to use swap, to avoid skewing the performance metrics.
Stats from a larger repo:
Metric | Baseline | Global Cache | Global Cache Without Heuristic | Global Cache With Heuristic 4 |
---|---|---|---|---|
Peak Memory | 23015.48MB | 20466.32MB (-11.1%) | 17741.96MB (-22.9%) | 19970.84MB (-13.2%) |
Idle Memory (post LRU clear) | 11944MB | 11936MB (-0.1%) | 11927MB (-0.1%) | 11933MB (-0.1%) |
NodeCache::node hitrate | 44.30% | 71.13% | 87.33% | 76.23% |
NodeCache::token hitrate | 89.38% | 99.31% | 99.31% | 99.31% |
Runtime | 148.49s | 145.36s (-2.1%) | 167.40s (+12.7%) | 144.28s (-2.8%) |
I didn't run any benchmarks for continuous editing, but they should benefit even more from this due to the fact they do small re-parses, so cache sharing has more benefit.