Skip to content

perf(decode::tree): preallocate Vec based on worst-case length#2462

Merged
Sebastian Thiel (Byron) merged 2 commits intoGitoxideLabs:mainfrom
datdenkikniet:micro-optimize
Mar 22, 2026
Merged

perf(decode::tree): preallocate Vec based on worst-case length#2462
Sebastian Thiel (Byron) merged 2 commits intoGitoxideLabs:mainfrom
datdenkikniet:micro-optimize

Conversation

@datdenkikniet
Copy link
Contributor

@datdenkikniet datdenkikniet commented Mar 7, 2026

A micro-optimization that is likely only really beneficial for the specific benchmark that it affects (TreeIter).

We can make a (terrible) guess at how many elements our tree contains. This lets us allocate at least as many entries as we will need, allowing the function to never re-allocate. Allocating vectors takes amortized constant time, but given that the time cost per item is so low (on the order of nanoseconds), avoiding them outright provides a pretty serious speedup.

I have no clue what the project's stance on over-allocation is: if they should be avoided, and/or if this specific optimization is unnecesary, let's close this PR (but remember that it exists).

Benchmark (cargo bench --bench decode-objects -- TreeRef, diff against main):

TreeRef()               time:   [58.240 ns 58.335 ns 58.458 ns]
                        change: [−38.740% −37.466% −36.027%] (p = 0.00 < 0.05)
                        Performance has improved.

We can make a (terrible) guess at how many elements our tree
contains. This lets us allocate at least as many entries as
we will need, allowing the function to never re-allocate.
Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 888feb71c7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

const HASH_LEN_FIXME: usize = 20;
let lower_bound_single_entry = 2 + HASH_LEN_FIXME; // 2 = space + trailing zero
let upper_bound_entries = i.len() / lower_bound_single_entry;
let mut out = Vec::with_capacity(upper_bound_entries);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid preallocating from unvalidated tree byte length

tree() now calls Vec::with_capacity(i.len() / 22) before validating a single entry, so malformed or attacker-controlled tree payloads can force a large allocation and OOM even though parsing immediately returns an error. Previously allocations only grew with successfully decoded entries, so this commit introduces a memory-amplification path for invalid inputs (e.g., corrupted objects received from untrusted repos).

Useful? React with 👍 / 👎.

@Byron
Copy link
Member

Sebastian Thiel (Byron) commented Mar 22, 2026

Thanks a lot! This is a substantial improvement. Let's see if anybody notices :D.

The below ran this branch first, then main, hence the regression.

Gnuplot not found, using plotters backend
TreeRef()               time:   [75.394 ns 77.303 ns 79.287 ns]
                        change: [+42.134% +44.023% +46.136%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  6 (6.00%) high mild
  12 (12.00%) high severe

TreeRefIter()           time:   [27.541 ns 27.850 ns 28.201 ns]
                        change: [+12.484% +14.426% +16.397%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)

I will also bound the amount of memory, for safety.

After that, we have this:

TreeRef()               time:   [61.171 ns 61.564 ns 61.987 ns]
                        change: [−19.327% −18.218% −17.224%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

TreeRefIter()           time:   [31.650 ns 31.987 ns 32.358 ns]
                        change: [+9.6066% +12.021% +14.383%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

So it seems allocating less large vectors is beneficial here.

This should already have the desired effect.
@Byron Sebastian Thiel (Byron) merged commit 8620420 into GitoxideLabs:main Mar 22, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants