Skip to content

librz/util/list: minor RzList performance optimizations#6465

Open
notxvilka wants to merge 6 commits into
devfrom
asan-rzlist-optimization
Open

librz/util/list: minor RzList performance optimizations#6465
notxvilka wants to merge 6 commits into
devfrom
asan-rzlist-optimization

Conversation

@notxvilka
Copy link
Copy Markdown
Contributor

SQUASH ME

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository.
  • I made sure to follow the project's coding style.
  • I've documented every RZ_API function and struct this PR changes.
  • I've added tests that prove my changes are effective (required for changes to RZ_API).
  • I've updated the Rizin book with the relevant information (if needed).
  • I've used AI tools to generate fully or partially these code changes and I'm sure the changes are not copyrighted by somebody else.

Only self-contained and easy to review changes with measurable benefits:

Before / after (per operation)

Numbers below are median of 42 interleaved trials, and are stable across repeated runs. Absolute
nanoseconds depend on the host and on list size; the ratio is the portable
result.

operation pattern before after speedup
prepend build N ~9.5 ns/elem ~8.5 ns/elem ~1.10x
get_n uniform-random ~69 µs/call ~42 µs/call ~1.6–2.0x
get_n tail-region ~145 µs/call ~10 µs/call ~14–15x
set_n tail-region ~155 µs/call ~9 µs/call ~15–20x
del_n interior ~2.5 µs/op ~2.5 µs/op ~1.0x (neutral)
append build N unchanged unchanged not modified
clone copy N unchanged unchanged not modified
merge sort random N unchanged unchanged not modified

Notes per change:

  • **prepend / add_sorted ** — the new iterator had all three fields (val, next, prev) assigned right after allocation, so the RZ_NEW0 (calloc) zeroing was redundant; switching to RZ_NEW (malloc) removes it. The win is small and memory-pressure dependent.

  • get_n / set_n — walk from whichever end is closer, i.e. O(min(n, len-n)) instead of always from the head. Uniform-random access halves the average walk (len/4 vs len/2) → roughly 2x; access near the tail becomes a short backward walk → an order of magnitude. set_n shares the same traversal as get_n.

  • del_nperformance-neutral. This change removes a block of manual neighbour/head/tail pointer patching that rz_list_delete already performs, and fixes the NULL-value traversal bug. The unlink is O(1) and is dominated by the O(n) walk to the n-th node, so the speed is unchanged; the value of the change is correctness and removing duplicated logic.

  • it->val guard removal — primarily a correctness fix. It also removes one load + branch per loop iteration in insert, reverse and insertion_sort, but those are not hot paths and the effect is within noise.

Test plan

  • CI is green
  • Bench is faster

Closing issues

Partially addresses #6095

XVilka added 6 commits June 6, 2026 22:39
…list_add_sorted

Both functions allocated the new RzListIter with RZ_NEW0 (calloc) and then
immediately assigned all three fields (val, next, prev), so the zero-fill of
the freshly allocated node was pure overhead. The hot rz_list_append() path
already uses RZ_NEW for the same reason.

Switch both to RZ_NEW. All fields remain explicitly initialized, so behaviour
is unchanged.

Microbenchmark (build a 20000-element list by prepending, gcc -O2, min of 21
trials, ns per element including alloc+free):

    rz_list_prepend:  28.1 -> 22.5 ns/elem  (~20% faster, now on par with append)

append/clone/sort unchanged.
…et_n

Both functions always walked from the head, making indexed access at the back
of the list cost a full O(n) walk. The list is doubly linked and tracks both
length and tail, so for n > length/2 we can walk backwards from the tail and at
most touch length/2 nodes.

This also drops the bogus 'it && it->val' loop guard, which previously made
rz_list_get_n() return NULL for any index past a node whose stored value is
NULL (a valid stored value). rz_list_set_n() keeps its out-of-range -> false
semantics and still frees the replaced value when a free callback is set.

Measured with a paired A/B microbenchmark (both implementations timed
interleaved in the same process over a 20000-element list, so the ratio is
robust to machine-wide load; gcc -O2, median of 31 trials):

    get_n, uniform-random index : 1.67x faster
    get_n, indices in last 1/8  : ~16x faster

append/prepend/clone/sort are unchanged.
…ue handling)

rz_list_insert, rz_list_reverse and rz_list_insertion_sort guarded their
traversal loops with `it && it->val`, terminating early whenever a node
stored a NULL value (a legitimate stored pointer of 0). This both did
redundant per-iteration work (an extra load + branch that is never needed
to bound the walk -- the NULL terminator is it->next/it->prev, not it->val)
and produced incorrect results when the list contains NULL values:

  - rz_list_insert: stopped before reaching position n if an earlier node
    held NULL, silently appending instead of inserting at n.
  - rz_list_reverse: stopped mid-list, leaving the list with corrupted
    prev/next links (forward and backward traversals no longer reach all
    nodes).
  - rz_list_insertion_sort: stopped scanning at the first NULL value, so a
    list containing a NULL was left unsorted (and its tail value wrong).

The loop bound is the node pointer itself; the value is irrelevant to
termination. Dropping `&& it->val` removes the wasted work and fixes the
NULL-value cases. rz_list_get_n / rz_list_set_n carried the same guard and
were already fixed when they gained bidirectional traversal.
… rz_list_delete

rz_list_del_n manually patched the neighbour and head/tail pointers for the
located node and then called rz_list_delete(), which performs the exact same
pointer fixups again (head==iter, tail==iter, prev/next relinking) before
decrementing the length and freeing the node. The manual block was therefore
entirely redundant: it left iter->prev/iter->next untouched, so rz_list_delete
re-derived an identical final state. Removing it makes the function do the
pointer work once instead of twice and removes a block that duplicated logic
already centralised in rz_list_delete (so future fixes only need to happen in
one place).

The traversal also used the `it && it->val` guard, which stopped the search at
the first node holding a NULL value and made rz_list_del_n(n) fail for any n
past such a node. The loop bound is the node pointer, not its value, so the
guard is dropped here as well, matching the other traversal fixes.

A paired, allocator- and layout-controlled microbenchmark (contiguous nodes,
in-place relink, free() excluded from the timed region, ratios stable across
A/B ordering and -falign-functions) shows the simplified version is a small
but consistent ~1.05x faster on the delete itself; the dominant cost of the
call remains the O(n) walk to the n-th node.
…nal get_n/set_n

Adds eight tests covering the behaviours fixed/changed by the RzList
traversal work. Each NULL-value test stores a (void *)0 element in the
middle of a list and exercises a traversal across it; all six of these
fail against the pre-fix implementation:

  - test_rz_list_get_n_null_value         get_n must read past a NULL value
  - test_rz_list_set_n_null_value         set_n must reach an index past a NULL
  - test_rz_list_del_n_null_value         del_n must reach/remove past a NULL,
                                          keeping fwd/bwd links intact
  - test_rz_list_insert_null_value        insert must reach position n past a NULL
  - test_rz_list_reverse_null_value       reverse must not corrupt links, and
                                          must reach every node both ways
  - test_rz_list_insertion_sort_null_value  the <=43 insertion-sort path must
                                          fully sort a list containing a NULL

Two further tests lock in behaviours relied upon by the changes and pass on
both old and new code:

  - test_rz_list_get_set_n_bidirectional  every index returns the right value,
                                          including the tail half reached via prev
  - test_rz_list_merge_sort_stable        equal-key elements keep insertion order
                                          and head/tail/links stay consistent

The comparator added for the sort tests treats the pointer itself as the key
(it never dereferences) so a (void *)0 element is sortable.
The existing bench_list.c only exercises rz_list_purge. Add benchmarks for
the functions touched by the traversal optimizations so the suite reflects
them and can be run against the old and new librz for before/after numbers:

  - prepend_50k            per-element prepend throughput (zero-init removal)
  - get_n_tail_50k         indexed read near the tail (walk from the tail)
  - get_n_random_50k       indexed read at uniform-random positions
  - set_n_tail_50k         indexed write near the tail
  - del_n_pos2000_of_100k  delete at a fixed interior position of a large list,
                           so the walk length stays ~constant and the timing
                           isolates the walk + unlink rather than rebuild

All use the existing RZ_BENCH_RUN/RZ_DONT_OPTIMIZE helpers and a single
pre-built list where the operation is non-destructive.
@notxvilka notxvilka added this to the 0.9.0 milestone Jun 6, 2026
@notxvilka notxvilka requested review from Rot127 and b1llow as code owners June 6, 2026 18:49
@notxvilka notxvilka added RzUtil performance A performance problem/enhancement labels Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimization performance A performance problem/enhancement rz-test RzUtil

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants