Skip to content

librz/util/vector: minor RzVector/RzPVector performance optimizations#6467

Open
notxvilka wants to merge 5 commits into
devfrom
asan-rzvector-optimization
Open

librz/util/vector: minor RzVector/RzPVector performance optimizations#6467
notxvilka wants to merge 5 commits into
devfrom
asan-rzvector-optimization

Conversation

@notxvilka
Copy link
Copy Markdown
Contributor

@notxvilka notxvilka commented Jun 6, 2026

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository.
  • I made sure to follow the project's coding style.
  • I've documented every RZ_API function and struct this PR changes.
  • I've added tests that prove my changes are effective (required for changes to RZ_API).
  • I've updated the Rizin book with the relevant information (if needed).
  • I've used AI tools to generate fully or partially these code changes and I'm sure the changes are not copyrighted by somebody else.

Detailed description

Only self-contained and easy to review changes with measurable benefits.

This PR speeds up rz_vector_sort and removes a per-call allocation and a redundant
double comparator-evaluation from the quicksort hot loop. It also simplifies the index
math in rz_pvector_remove_data. All changes are behaviour-preserving — the sort
produces byte-identical output to the previous implementation for the same input.

rz_vector_sort is ~1.3–1.4× faster for small/medium elements, driven mainly by
calling the comparator once per element instead of up to twice. Large (> 256-byte)
elements are dominated by per-element memcpy and are essentially unchanged.

Case Before After Speedup
rz_vector_sort, int, cheap comparator ~141 ns/elem ~108 ns/elem ~1.30×
rz_vector_sort, int, expensive comparator ~560 ns/elem ~400 ns/elem ~1.40×
rz_vector_sort, 304-byte elements (heap scratch) ~318 ns/elem ~313 ns/elem ~1.0× (neutral)
rz_pvector_sort unchanged unchanged not modified

Test plan

  • CI is green
  • Benches are faster

Closing issues

Partially addresses #6096

XVilka added 5 commits June 7, 2026 00:34
vector_quick_sort allocated its two element-sized scratch buffers (t and
pivot) with malloc/free on every recursive call. For a vector of n elements
the sort makes O(n) recursive calls, i.e. O(n) malloc/free pairs purely for
scratch space, and each call could also fail half-way through the sort.

Split the function into a small entry point that allocates the two buffers
once and a recursive worker that receives them as scratch. The buffers are
reused across the whole recursion (each partition step finishes using them
before recursing, and the recursion is sequential, so sharing one pair is
safe). Small elements -- the common case, including every RzPVector-backed
sort -- use stack buffers and allocate nothing at all; only elements larger
than 256 bytes fall back to a single heap allocation for the whole sort.

The element movement and rand()-based pivot selection are unchanged, so the
result is identical for any input (verified byte-for-byte against the previous
implementation for ascending and descending orders over many random arrays).
The partition loop tested the element against the pivot with two separate
calls to the comparator:

    if ((cmp(VEC_INDEX(a, i), pivot, user) < 0 && !reverse) ||
        (cmp(VEC_INDEX(a, i), pivot, user) > 0 && reverse)) {

Because cmp is an opaque function pointer the compiler cannot common up the
two calls, so depending on the result and the reverse flag the comparator was
invoked up to twice per element. Compute the result once into a local and test
that:

    int c = cmp(VEC_INDEX(a, i), pivot, user);
    if ((c < 0 && !reverse) || (c > 0 && reverse)) {

This halves comparator calls in the worst case and is a clear win whenever the
comparator is non-trivial (the common case for struct elements). Measured on a
shared host: ~12-14% faster for int sorting and ~30% faster with a moderately
expensive comparator. The ordering is unchanged (verified byte-for-byte).
The index of the located slot was computed as

    size_t index = (el - (void **)vec->v.a) * sizeof(void **) / vec->v.elem_size;

For an RzPVector the element size is always sizeof(void *), so the
`* sizeof(void **) / vec->v.elem_size` factor is identically 1 and the pointer
difference `el - (void **)vec->v.a` already yields the index directly. Drop the
redundant scaling, which removes a multiply and a divide and makes the intent
clear. Behaviour is unchanged.
The existing sort tests only sort 4-5 small elements and there was no test for
rz_pvector_remove_data. Add coverage for the code paths exercised by the sort
changes and the remove_data cleanup:

  - test_vector_sort_large       sort 2000 heavily-duplicated ut32 values
                                 ascending and descending, verifying the result
                                 is ordered and a permutation of the input (vs a
                                 reference qsort). Drives the recursion deeply
                                 and the shared scratch buffers.
  - test_vector_sort_large_elem  sort 400 elements of 304 bytes each, taking the
                                 heap-allocated scratch fallback, and check the
                                 full payload (not just the key) stays consistent
                                 through all the element moves.
  - test_pvector_remove_data     remove interior, first and last elements by
                                 value while preserving order, and confirm
                                 removing an absent value is a no-op.

All pass on both the previous and the optimized implementation (the sort and
remove_data changes are behaviour-preserving).
bench_vector.c benchmarked only remove_at and swap. Add sort benchmarks so the
suite covers the functions touched by the sort optimizations and can be run
against the old and new librz for before/after numbers:

  - rz_vector_sort over 4k ut64 with a cheap comparator
  - rz_vector_sort over 4k ut64 with a deliberately expensive comparator
    (shows the effect of evaluating the comparator once per element)
  - rz_pvector_sort over 4k pointers (reference; pvector sort is unchanged)

Each iteration refills the buffer from an unsorted master copy via a single
memcpy before sorting; that overhead is identical across builds so the measured
delta reflects the sort.
@notxvilka notxvilka changed the title librz/util/list: minor RzVector/RzPVector performance optimizations librz/util/vector: minor RzVector/RzPVector performance optimizations Jun 6, 2026
@notxvilka notxvilka marked this pull request as ready for review June 6, 2026 20:48
@notxvilka notxvilka added this to the 0.9.0 milestone Jun 6, 2026
@notxvilka notxvilka added optimization performance A performance problem/enhancement labels Jun 6, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 7, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.23%. Comparing base (d723950) to head (b1762dc).
⚠️ Report is 5 commits behind head on dev.

Files with missing lines Patch % Lines
librz/util/vector.c 66.66% 6 Missing and 2 partials ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
librz/util/vector.c 79.30% <66.66%> (-0.09%) ⬇️

... and 14 files with indirect coverage changes


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d723950...b1762dc. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimization performance A performance problem/enhancement rz-test RzUtil

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants