librz/util/list: minor RzList performance optimizations by notxvilka · Pull Request #6465 · rizinorg/rizin

notxvilka · 2026-06-06T18:49:55Z

SQUASH ME

Your checklist for this pull request

I've read the guidelines for contributing to this repository.
I made sure to follow the project's coding style.
I've documented every RZ_API function and struct this PR changes.
I've added tests that prove my changes are effective (required for changes to RZ_API).
I've updated the Rizin book with the relevant information (if needed).
I've used AI tools to generate fully or partially these code changes and I'm sure the changes are not copyrighted by somebody else.

Only self-contained and easy to review changes with measurable benefits:

Before / after (per operation)

Numbers below are median of 42 interleaved trials, and are stable across repeated runs. Absolute
nanoseconds depend on the host and on list size; the ratio is the portable
result.

operation	pattern	before	after	speedup
prepend	build N	~9.5 ns/elem	~8.5 ns/elem	~1.10x
get_n	uniform-random	~69 µs/call	~42 µs/call	~1.6–2.0x
get_n	tail-region	~145 µs/call	~10 µs/call	~14–15x
set_n	tail-region	~155 µs/call	~9 µs/call	~15–20x
del_n	interior	~2.5 µs/op	~2.5 µs/op	~1.0x (neutral)
append	build N	unchanged	unchanged	not modified
clone	copy N	unchanged	unchanged	not modified
merge sort	random N	unchanged	unchanged	not modified

Notes per change:

**prepend / add_sorted ** — the new iterator had all three fields (val, next, prev) assigned right after allocation, so the RZ_NEW0 (calloc) zeroing was redundant; switching to RZ_NEW (malloc) removes it. The win is small and memory-pressure dependent.
get_n / set_n — walk from whichever end is closer, i.e. O(min(n, len-n)) instead of always from the head. Uniform-random access halves the average walk (len/4 vs len/2) → roughly 2x; access near the tail becomes a short backward walk → an order of magnitude. set_n shares the same traversal as get_n.
del_n — performance-neutral. This change removes a block of manual neighbour/head/tail pointer patching that rz_list_delete already performs, and fixes the NULL-value traversal bug. The unlink is O(1) and is dominated by the O(n) walk to the n-th node, so the speed is unchanged; the value of the change is correctness and removing duplicated logic.
it->val guard removal — primarily a correctness fix. It also removes one load + branch per loop iteration in insert, reverse and insertion_sort, but those are not hot paths and the effect is within noise.

Test plan

CI is green
Bench is faster

Closing issues

Partially addresses #6095

…list_add_sorted Both functions allocated the new RzListIter with RZ_NEW0 (calloc) and then immediately assigned all three fields (val, next, prev), so the zero-fill of the freshly allocated node was pure overhead. The hot rz_list_append() path already uses RZ_NEW for the same reason. Switch both to RZ_NEW. All fields remain explicitly initialized, so behaviour is unchanged. Microbenchmark (build a 20000-element list by prepending, gcc -O2, min of 21 trials, ns per element including alloc+free): rz_list_prepend: 28.1 -> 22.5 ns/elem (~20% faster, now on par with append) append/clone/sort unchanged.

…et_n Both functions always walked from the head, making indexed access at the back of the list cost a full O(n) walk. The list is doubly linked and tracks both length and tail, so for n > length/2 we can walk backwards from the tail and at most touch length/2 nodes. This also drops the bogus 'it && it->val' loop guard, which previously made rz_list_get_n() return NULL for any index past a node whose stored value is NULL (a valid stored value). rz_list_set_n() keeps its out-of-range -> false semantics and still frees the replaced value when a free callback is set. Measured with a paired A/B microbenchmark (both implementations timed interleaved in the same process over a 20000-element list, so the ratio is robust to machine-wide load; gcc -O2, median of 31 trials): get_n, uniform-random index : 1.67x faster get_n, indices in last 1/8 : ~16x faster append/prepend/clone/sort are unchanged.

…ue handling) rz_list_insert, rz_list_reverse and rz_list_insertion_sort guarded their traversal loops with `it && it->val`, terminating early whenever a node stored a NULL value (a legitimate stored pointer of 0). This both did redundant per-iteration work (an extra load + branch that is never needed to bound the walk -- the NULL terminator is it->next/it->prev, not it->val) and produced incorrect results when the list contains NULL values: - rz_list_insert: stopped before reaching position n if an earlier node held NULL, silently appending instead of inserting at n. - rz_list_reverse: stopped mid-list, leaving the list with corrupted prev/next links (forward and backward traversals no longer reach all nodes). - rz_list_insertion_sort: stopped scanning at the first NULL value, so a list containing a NULL was left unsorted (and its tail value wrong). The loop bound is the node pointer itself; the value is irrelevant to termination. Dropping `&& it->val` removes the wasted work and fixes the NULL-value cases. rz_list_get_n / rz_list_set_n carried the same guard and were already fixed when they gained bidirectional traversal.

… rz_list_delete rz_list_del_n manually patched the neighbour and head/tail pointers for the located node and then called rz_list_delete(), which performs the exact same pointer fixups again (head==iter, tail==iter, prev/next relinking) before decrementing the length and freeing the node. The manual block was therefore entirely redundant: it left iter->prev/iter->next untouched, so rz_list_delete re-derived an identical final state. Removing it makes the function do the pointer work once instead of twice and removes a block that duplicated logic already centralised in rz_list_delete (so future fixes only need to happen in one place). The traversal also used the `it && it->val` guard, which stopped the search at the first node holding a NULL value and made rz_list_del_n(n) fail for any n past such a node. The loop bound is the node pointer, not its value, so the guard is dropped here as well, matching the other traversal fixes. A paired, allocator- and layout-controlled microbenchmark (contiguous nodes, in-place relink, free() excluded from the timed region, ratios stable across A/B ordering and -falign-functions) shows the simplified version is a small but consistent ~1.05x faster on the delete itself; the dominant cost of the call remains the O(n) walk to the n-th node.

…nal get_n/set_n Adds eight tests covering the behaviours fixed/changed by the RzList traversal work. Each NULL-value test stores a (void *)0 element in the middle of a list and exercises a traversal across it; all six of these fail against the pre-fix implementation: - test_rz_list_get_n_null_value get_n must read past a NULL value - test_rz_list_set_n_null_value set_n must reach an index past a NULL - test_rz_list_del_n_null_value del_n must reach/remove past a NULL, keeping fwd/bwd links intact - test_rz_list_insert_null_value insert must reach position n past a NULL - test_rz_list_reverse_null_value reverse must not corrupt links, and must reach every node both ways - test_rz_list_insertion_sort_null_value the <=43 insertion-sort path must fully sort a list containing a NULL Two further tests lock in behaviours relied upon by the changes and pass on both old and new code: - test_rz_list_get_set_n_bidirectional every index returns the right value, including the tail half reached via prev - test_rz_list_merge_sort_stable equal-key elements keep insertion order and head/tail/links stay consistent The comparator added for the sort tests treats the pointer itself as the key (it never dereferences) so a (void *)0 element is sortable.

The existing bench_list.c only exercises rz_list_purge. Add benchmarks for the functions touched by the traversal optimizations so the suite reflects them and can be run against the old and new librz for before/after numbers: - prepend_50k per-element prepend throughput (zero-init removal) - get_n_tail_50k indexed read near the tail (walk from the tail) - get_n_random_50k indexed read at uniform-random positions - set_n_tail_50k indexed write near the tail - del_n_pos2000_of_100k delete at a fixed interior position of a large list, so the walk length stays ~constant and the timing isolates the walk + unlink rather than rebuild All use the existing RZ_BENCH_RUN/RZ_DONT_OPTIMIZE helpers and a single pre-built list where the operation is non-destructive.

XVilka added 6 commits June 6, 2026 22:39

notxvilka added this to the 0.9.0 milestone Jun 6, 2026

notxvilka requested review from kazarmy, ret2libc, thestr4ng3r and wargio as code owners June 6, 2026 18:49

notxvilka added the optimization label Jun 6, 2026

notxvilka requested review from Rot127 and b1llow as code owners June 6, 2026 18:49

notxvilka added RzUtil performance A performance problem/enhancement labels Jun 6, 2026

github-actions Bot added the rz-test label Jun 6, 2026

notxvilka mentioned this pull request Jun 7, 2026

librz/bin/dyldcache, librz/debug/dmp: fix memory leaks #6471

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

librz/util/list: minor RzList performance optimizations#6465

librz/util/list: minor RzList performance optimizations#6465
notxvilka wants to merge 6 commits into
devfrom
asan-rzlist-optimization

notxvilka commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

notxvilka commented Jun 6, 2026

SQUASH ME

Before / after (per operation)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants