librz/util/list: minor RzList performance optimizations#6465
Open
notxvilka wants to merge 6 commits into
Open
Conversation
…list_add_sorted
Both functions allocated the new RzListIter with RZ_NEW0 (calloc) and then
immediately assigned all three fields (val, next, prev), so the zero-fill of
the freshly allocated node was pure overhead. The hot rz_list_append() path
already uses RZ_NEW for the same reason.
Switch both to RZ_NEW. All fields remain explicitly initialized, so behaviour
is unchanged.
Microbenchmark (build a 20000-element list by prepending, gcc -O2, min of 21
trials, ns per element including alloc+free):
rz_list_prepend: 28.1 -> 22.5 ns/elem (~20% faster, now on par with append)
append/clone/sort unchanged.
…et_n
Both functions always walked from the head, making indexed access at the back
of the list cost a full O(n) walk. The list is doubly linked and tracks both
length and tail, so for n > length/2 we can walk backwards from the tail and at
most touch length/2 nodes.
This also drops the bogus 'it && it->val' loop guard, which previously made
rz_list_get_n() return NULL for any index past a node whose stored value is
NULL (a valid stored value). rz_list_set_n() keeps its out-of-range -> false
semantics and still frees the replaced value when a free callback is set.
Measured with a paired A/B microbenchmark (both implementations timed
interleaved in the same process over a 20000-element list, so the ratio is
robust to machine-wide load; gcc -O2, median of 31 trials):
get_n, uniform-random index : 1.67x faster
get_n, indices in last 1/8 : ~16x faster
append/prepend/clone/sort are unchanged.
…ue handling)
rz_list_insert, rz_list_reverse and rz_list_insertion_sort guarded their
traversal loops with `it && it->val`, terminating early whenever a node
stored a NULL value (a legitimate stored pointer of 0). This both did
redundant per-iteration work (an extra load + branch that is never needed
to bound the walk -- the NULL terminator is it->next/it->prev, not it->val)
and produced incorrect results when the list contains NULL values:
- rz_list_insert: stopped before reaching position n if an earlier node
held NULL, silently appending instead of inserting at n.
- rz_list_reverse: stopped mid-list, leaving the list with corrupted
prev/next links (forward and backward traversals no longer reach all
nodes).
- rz_list_insertion_sort: stopped scanning at the first NULL value, so a
list containing a NULL was left unsorted (and its tail value wrong).
The loop bound is the node pointer itself; the value is irrelevant to
termination. Dropping `&& it->val` removes the wasted work and fixes the
NULL-value cases. rz_list_get_n / rz_list_set_n carried the same guard and
were already fixed when they gained bidirectional traversal.
… rz_list_delete rz_list_del_n manually patched the neighbour and head/tail pointers for the located node and then called rz_list_delete(), which performs the exact same pointer fixups again (head==iter, tail==iter, prev/next relinking) before decrementing the length and freeing the node. The manual block was therefore entirely redundant: it left iter->prev/iter->next untouched, so rz_list_delete re-derived an identical final state. Removing it makes the function do the pointer work once instead of twice and removes a block that duplicated logic already centralised in rz_list_delete (so future fixes only need to happen in one place). The traversal also used the `it && it->val` guard, which stopped the search at the first node holding a NULL value and made rz_list_del_n(n) fail for any n past such a node. The loop bound is the node pointer, not its value, so the guard is dropped here as well, matching the other traversal fixes. A paired, allocator- and layout-controlled microbenchmark (contiguous nodes, in-place relink, free() excluded from the timed region, ratios stable across A/B ordering and -falign-functions) shows the simplified version is a small but consistent ~1.05x faster on the delete itself; the dominant cost of the call remains the O(n) walk to the n-th node.
…nal get_n/set_n
Adds eight tests covering the behaviours fixed/changed by the RzList
traversal work. Each NULL-value test stores a (void *)0 element in the
middle of a list and exercises a traversal across it; all six of these
fail against the pre-fix implementation:
- test_rz_list_get_n_null_value get_n must read past a NULL value
- test_rz_list_set_n_null_value set_n must reach an index past a NULL
- test_rz_list_del_n_null_value del_n must reach/remove past a NULL,
keeping fwd/bwd links intact
- test_rz_list_insert_null_value insert must reach position n past a NULL
- test_rz_list_reverse_null_value reverse must not corrupt links, and
must reach every node both ways
- test_rz_list_insertion_sort_null_value the <=43 insertion-sort path must
fully sort a list containing a NULL
Two further tests lock in behaviours relied upon by the changes and pass on
both old and new code:
- test_rz_list_get_set_n_bidirectional every index returns the right value,
including the tail half reached via prev
- test_rz_list_merge_sort_stable equal-key elements keep insertion order
and head/tail/links stay consistent
The comparator added for the sort tests treats the pointer itself as the key
(it never dereferences) so a (void *)0 element is sortable.
The existing bench_list.c only exercises rz_list_purge. Add benchmarks for
the functions touched by the traversal optimizations so the suite reflects
them and can be run against the old and new librz for before/after numbers:
- prepend_50k per-element prepend throughput (zero-init removal)
- get_n_tail_50k indexed read near the tail (walk from the tail)
- get_n_random_50k indexed read at uniform-random positions
- set_n_tail_50k indexed write near the tail
- del_n_pos2000_of_100k delete at a fixed interior position of a large list,
so the walk length stays ~constant and the timing
isolates the walk + unlink rather than rebuild
All use the existing RZ_BENCH_RUN/RZ_DONT_OPTIMIZE helpers and a single
pre-built list where the operation is non-destructive.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SQUASH ME
Your checklist for this pull request
RZ_APIfunction and struct this PR changes.RZ_API).Only self-contained and easy to review changes with measurable benefits:
Before / after (per operation)
Numbers below are median of 42 interleaved trials, and are stable across repeated runs. Absolute
nanoseconds depend on the host and on list size; the ratio is the portable
result.
Notes per change:
**prepend / add_sorted ** — the new iterator had all three fields (
val,next,prev) assigned right after allocation, so theRZ_NEW0(calloc) zeroing was redundant; switching toRZ_NEW(malloc) removes it. The win is small and memory-pressure dependent.get_n / set_n — walk from whichever end is closer, i.e.
O(min(n, len-n))instead of always from the head. Uniform-random access halves the average walk (len/4 vs len/2) → roughly 2x; access near the tail becomes a short backward walk → an order of magnitude.set_nshares the same traversal asget_n.del_n — performance-neutral. This change removes a block of manual neighbour/head/tail pointer patching that
rz_list_deletealready performs, and fixes the NULL-value traversal bug. The unlink is O(1) and is dominated by the O(n) walk to the n-th node, so the speed is unchanged; the value of the change is correctness and removing duplicated logic.it->val guard removal — primarily a correctness fix. It also removes one load + branch per loop iteration in
insert,reverseandinsertion_sort, but those are not hot paths and the effect is within noise.Test plan
Closing issues
Partially addresses #6095