librz/util/vector: minor RzVector/RzPVector performance optimizations by notxvilka · Pull Request #6467 · rizinorg/rizin

notxvilka · 2026-06-06T20:41:20Z

Your checklist for this pull request

I've read the guidelines for contributing to this repository.
I made sure to follow the project's coding style.
I've documented every RZ_API function and struct this PR changes.
I've added tests that prove my changes are effective (required for changes to RZ_API).
I've updated the Rizin book with the relevant information (if needed).
I've used AI tools to generate fully or partially these code changes and I'm sure the changes are not copyrighted by somebody else.

Detailed description

Only self-contained and easy to review changes with measurable benefits.

This PR speeds up rz_vector_sort and removes a per-call allocation and a redundant
double comparator-evaluation from the quicksort hot loop. It also simplifies the index
math in rz_pvector_remove_data. All changes are behaviour-preserving — the sort
produces byte-identical output to the previous implementation for the same input.

rz_vector_sort is ~1.3–1.4× faster for small/medium elements, driven mainly by
calling the comparator once per element instead of up to twice. Large (> 256-byte)
elements are dominated by per-element memcpy and are essentially unchanged.

Case	Before	After	Speedup
`rz_vector_sort`, int, cheap comparator	~141 ns/elem	~108 ns/elem	~1.30×
`rz_vector_sort`, int, expensive comparator	~560 ns/elem	~400 ns/elem	~1.40×
`rz_vector_sort`, 304-byte elements (heap scratch)	~318 ns/elem	~313 ns/elem	~1.0× (neutral)
`rz_pvector_sort`	unchanged	unchanged	not modified

Test plan

CI is green
Benches are faster

Closing issues

Partially addresses #6096

vector_quick_sort allocated its two element-sized scratch buffers (t and pivot) with malloc/free on every recursive call. For a vector of n elements the sort makes O(n) recursive calls, i.e. O(n) malloc/free pairs purely for scratch space, and each call could also fail half-way through the sort. Split the function into a small entry point that allocates the two buffers once and a recursive worker that receives them as scratch. The buffers are reused across the whole recursion (each partition step finishes using them before recursing, and the recursion is sequential, so sharing one pair is safe). Small elements -- the common case, including every RzPVector-backed sort -- use stack buffers and allocate nothing at all; only elements larger than 256 bytes fall back to a single heap allocation for the whole sort. The element movement and rand()-based pivot selection are unchanged, so the result is identical for any input (verified byte-for-byte against the previous implementation for ascending and descending orders over many random arrays).

The partition loop tested the element against the pivot with two separate calls to the comparator: if ((cmp(VEC_INDEX(a, i), pivot, user) < 0 && !reverse) || (cmp(VEC_INDEX(a, i), pivot, user) > 0 && reverse)) { Because cmp is an opaque function pointer the compiler cannot common up the two calls, so depending on the result and the reverse flag the comparator was invoked up to twice per element. Compute the result once into a local and test that: int c = cmp(VEC_INDEX(a, i), pivot, user); if ((c < 0 && !reverse) || (c > 0 && reverse)) { This halves comparator calls in the worst case and is a clear win whenever the comparator is non-trivial (the common case for struct elements). Measured on a shared host: ~12-14% faster for int sorting and ~30% faster with a moderately expensive comparator. The ordering is unchanged (verified byte-for-byte).

The index of the located slot was computed as size_t index = (el - (void **)vec->v.a) * sizeof(void **) / vec->v.elem_size; For an RzPVector the element size is always sizeof(void *), so the `* sizeof(void **) / vec->v.elem_size` factor is identically 1 and the pointer difference `el - (void **)vec->v.a` already yields the index directly. Drop the redundant scaling, which removes a multiply and a divide and makes the intent clear. Behaviour is unchanged.

The existing sort tests only sort 4-5 small elements and there was no test for rz_pvector_remove_data. Add coverage for the code paths exercised by the sort changes and the remove_data cleanup: - test_vector_sort_large sort 2000 heavily-duplicated ut32 values ascending and descending, verifying the result is ordered and a permutation of the input (vs a reference qsort). Drives the recursion deeply and the shared scratch buffers. - test_vector_sort_large_elem sort 400 elements of 304 bytes each, taking the heap-allocated scratch fallback, and check the full payload (not just the key) stays consistent through all the element moves. - test_pvector_remove_data remove interior, first and last elements by value while preserving order, and confirm removing an absent value is a no-op. All pass on both the previous and the optimized implementation (the sort and remove_data changes are behaviour-preserving).

bench_vector.c benchmarked only remove_at and swap. Add sort benchmarks so the suite covers the functions touched by the sort optimizations and can be run against the old and new librz for before/after numbers: - rz_vector_sort over 4k ut64 with a cheap comparator - rz_vector_sort over 4k ut64 with a deliberately expensive comparator (shows the effect of evaluating the comparator once per element) - rz_pvector_sort over 4k pointers (reference; pvector sort is unchanged) Each iteration refills the buffer from an unsorted master copy via a single memcpy before sorting; that overhead is identical across builds so the measured delta reflects the sort.

codecov · 2026-06-07T16:01:11Z

Codecov Report

❌ Patch coverage is 66.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.23%. Comparing base (d723950) to head (b1762dc).
⚠️ Report is 5 commits behind head on dev.

Files with missing lines	Patch %	Lines
librz/util/vector.c	66.66%	6 Missing and 2 partials ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
librz/util/vector.c	`79.30% <66.66%> (-0.09%)`	⬇️

... and 14 files with indirect coverage changes

Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d723950...b1762dc. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

XVilka added 5 commits June 7, 2026 00:34

github-actions Bot added rz-test RzUtil labels Jun 6, 2026

notxvilka changed the title ~~librz/util/list: minor RzVector/RzPVector performance optimizations~~ librz/util/vector: minor RzVector/RzPVector performance optimizations Jun 6, 2026

notxvilka marked this pull request as ready for review June 6, 2026 20:48

notxvilka requested review from Rot127, b1llow, kazarmy, ret2libc, thestr4ng3r and wargio as code owners June 6, 2026 20:48

notxvilka added this to the 0.9.0 milestone Jun 6, 2026

notxvilka added optimization performance A performance problem/enhancement labels Jun 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

librz/util/vector: minor RzVector/RzPVector performance optimizations#6467

librz/util/vector: minor RzVector/RzPVector performance optimizations#6467
notxvilka wants to merge 5 commits into
devfrom
asan-rzvector-optimization

notxvilka commented Jun 6, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

notxvilka commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 7, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

notxvilka commented Jun 6, 2026 •

edited

Loading