Skip to content

Conversation

@uriyage
Copy link
Contributor

@uriyage uriyage commented Sep 15, 2025

Prevent iteration beyond hashtable bounds by checking if iterator is already at the end before proceeding with the main iteration loop. This adds defensive bounds checking for table index and bucket index.

@codecov
Copy link

codecov bot commented Sep 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.03%. Comparing base (a47e8fa) to head (c57878c).
⚠️ Report is 22 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2611      +/-   ##
============================================
- Coverage     72.24%   72.03%   -0.22%     
============================================
  Files           127      128       +1     
  Lines         70820    71041     +221     
============================================
+ Hits          51167    51175       +8     
- Misses        19653    19866     +213     
Files with missing lines Coverage Δ
src/hashtable.c 89.35% <100.00%> (+0.04%) ⬆️

... and 28 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Uri!

Note that this is just for making it safe to call hashtableNext again when it has already returned false. It's not really needed, but it makes it easier to detect bad usage, or is it for convenience?

Isn't calling hashtableNext() after the end a bug that we want to detect? Why not do an assert instead of graceful return then?

Prevent iteration beyond hashtable bounds by checking if iterator
is already at the end before proceeding with the main iteration loop.
This adds defensive bounds checking for table index and bucket index.

Signed-off-by: Uri Yagelnik <[email protected]>
@uriyage uriyage force-pushed the hashtable-iterator-bounds branch from 31e5445 to 6df1a9a Compare September 15, 2025 11:44
@ranshid
Copy link
Member

ranshid commented Sep 17, 2025

@uriyage let's also add unittest for this?

Signed-off-by: Uri Yagelnik <[email protected]>
@uriyage uriyage force-pushed the hashtable-iterator-bounds branch from 39f159b to c57878c Compare October 5, 2025 10:21
@uriyage
Copy link
Contributor Author

uriyage commented Oct 5, 2025

@uriyage let's also add unittest for this?

Done

Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zuiderkwast
Copy link
Contributor

Many times I've tried to reject adding dependencies from the low-level datastructures to the whole of the server globals and everything. Dependencies should be in the other direction, from high level to lower level components.

@madolson
Copy link
Member

I somewhat agree with Viktor. We can add a debugAssert to severassert.h if we want, that is just a no-op.

I was also dubious of this PR, since it's just adding a defensive check. @uriyage @ranshid Are we still moving forward with this?

@zuiderkwast
Copy link
Contributor

I somewhat agree with Viktor. We can add a debugAssert to severassert.h if we want, that is just a no-op.

We can just use assert instead. The check in the asserts are very light-weight, not even heavier than checking if the debugging is enabled.

Debug assert depends on a server config. That would prevent usage of this hashtable in non-server code (valkey-cli, valkey-benchmark).

@zuiderkwast
Copy link
Contributor

@madolson According to a commment above, Uri explained why:

We have a use case where we don't save the iterator state and attempt to run it again.

I suppose we could wait for this use case to be upstreamed and then include this change at the same time.

@madolson
Copy link
Member

Couldn't they then just save the state?

I agree, I feel like I would rather close this and re-open if that use case becomes public. I'm not sure what that case is.

Comment on lines +1939 to +1940
#define HASHTABLE_ITER_PRIMARY_TABLE 0
#define HASHTABLE_ITER_REHASH_TABLE 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These defines are unnecessary and create an inconsistency with the rest of the code. If we want to create theses (which I don't recommend) the entire file should be refactored to use them consistently.

@JimB123
Copy link
Member

JimB123 commented Oct 23, 2025

Can someone more clearly lay out the scenario here? I'm thinking that this is a case of: 1) creating a safe iterator on the hashtable, 2) deleting or moving the underlying table, 3) continuing to call hashtableNext. Is this right? If so, I don't think this solves the problem.

With a safe iterator, it's necessary to ensure that the iterator gets deleted before the underlying hashtable is deleted. If not, when we delete the iterator, we will try to re-enable rehashing on the (deleted) hashtable.

One possible way to address this is to keep a list of all "safe" iterators associated with each hashtable. When the hashtable is deleted, it could then mark the iterators as invalid. This could also be done for unsafe iterators, but I don't think that's necessary as unsafe iterators, by their nature, are created/used/deleted in a short time.

@JimB123
Copy link
Member

JimB123 commented Oct 24, 2025

I've done a review on the iterator, and these are the things that need to be addressed. There are a few potential issues with long-lived (safe) iterators. One use case for this is for defrag which iterates over hashes in an incremental manner - spanning across timer invocations.

Issue 1: Rehashing isn't paused immediately

Ref:

valkey/src/hashtable.c

Lines 2036 to 2037 in 1cf0df9

if (isSafe(iter)) {
hashtablePauseRehashing(iter->hashtable);

At this code reference, the logic indicates that if it's a safe iterator, that once we perform the first call to hashtableNext that we should pause rehashing. Rather than pausing when the iterator is created, the pause is deferred until we start iterating. This is potentially dangerous. Code, like kvstore, looks at the current rehashing paused state to determine if it's safe to delete the table. This makes it possible for: 1) iterator created, 2) kvstore deletes the table, 3) iteration begins, and the isSafe() clause dereferences the deleted hash table.

Issue 2: Empty hashtables remain in inconsistent state

Ref:

valkey/src/hashtable.c

Lines 2042 to 2044 in 1cf0df9

if (iter->hashtable->tables[0] == NULL) {
/* Empty hashtable. We're done. */
break;

With an empty hash table, the iterator returns false (indicating that iteration has completed). However, index/table still remain in the initial state - as if iteration has not yet begun. This makes it possible for an iterator to return false (indicating completion) but then later, if called again, begin returning newly added values. This likely isn't dangerous, because the caller is unlikely to call hashtableNext a second time after it has already returned false. But it is inconsistent, and easily fixed.

Issue 3: Deletion of hashtable makes hashtableNext unsafe

Ref:

valkey/src/hashtable.c

Lines 2054 to 2063 in 1cf0df9

/* Advance to the next position within the bucket, or to the next
* child bucket in a chain, or to the next bucket index, or to the
* next table. */
iter->pos_in_bucket++;
if (iter->bucket->chained && iter->pos_in_bucket >= ENTRIES_PER_BUCKET - 1) {
iter->pos_in_bucket = 0;
iter->bucket = getChildBucket(iter->bucket);
} else if (iter->pos_in_bucket >= ENTRIES_PER_BUCKET) {
/* Bucket index done. */
if (isSafe(iter)) {

Once the iterator has started, hashtableNext relies on both iter->bucket and iter->hashtable to remain valid. It looks like iter->bucket is ok - it's handled internally that buckets aren't deleted if rehashing is paused. However there's no protection for iter->hashtable. This requires that every user of a safe iterator must be aware of hashtable deletion and avoid calling hashtableNext once the table has been deleted.

I'll suggest that the proper solution is to have a hashtable maintain a list of safe iterators. If the hashtable gets deleted, each of the iterators should be marked as completed such that future calls to hashtableNext will simply return false, without accessing iter->hashtable.

Issue 4: Deletion of hashtable prevents deletion of the iterator

Ref:

valkey/src/hashtable.c

Lines 2023 to 2024 in 1cf0df9

void hashtableReleaseIterator(hashtableIterator *iterator) {
hashtableResetIterator(iterator);

This is fairly serious. As it stands, every user of a safe iterator must be aware of a hashtable deletion BEFORE it happens. Once the hashtable has been deleted, it becomes impossible to safely delete the iterator. The iterator deletion attempts to unpause rehashing and accesses the (deleted) hashtable.

I'll suggest that the solution for this is identical to the proposed solution for issue 3. Once the iterator is marked as completed, it should not attempt to access iter->hashtable at deletion time.

@zuiderkwast @rainsupreme - please review.

@JimB123 JimB123 requested a review from rainsupreme October 24, 2025 16:41
@rainsupreme
Copy link
Contributor

That's a pretty thorough analysis Jim! I guess there are maybe a few scenarios in OSS where it feels like we might delete the hashtable while the iterator exists, though I haven't fully investigated these:

  • pubsub: client disconnect while iterating unsubscribe all?
  • vset: could be deleted while processing expiry?
  • modules: subcommands hashtable could be freed while being iterated - can modules add/remove commands or be loaded/unloaded at any time?

Adding tracking for safe iterator invalidation should be pretty lightweight in terms of performance and memory. I think we should add it, with unit tests. Also, I'm pretty sure this issue exists for dict's safe iterator as well - dict will be worth fixing if hashtable is.

Copy link
Contributor

@rainsupreme rainsupreme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should invalidate safe iterators for hashtable and dict when the underlying collection is deleted, to prevent invalid memory access and undefined states.

@JimB123
Copy link
Member

JimB123 commented Oct 27, 2025

One other potential issue is with defragmentation. If the hashtable is moved during defragmentation, any active safe iterators will access the deleted memory. I don't think this can happen with defrag's own iterator, but if there's a potential for other iterators, this might be a concern. (e.g. forkless save?)

@zuiderkwast
Copy link
Contributor

I think we should fix these bugs found by Jim in a separate PR. Essentially, each hashtable needs to keep track of a list of the safe iterators. (The same problems exist with dict, I imagine.)

However, I don't think we allow long-lived iterators currently. Only a few operations are allowed while iterating (i.e. while the iterator exists) so letting the event loop run and later resume iteration is not currently supported, as I read these doc comments: https://github.com/valkey-io/valkey/blob/9.0/src/hashtable.c#L1960-L1965

@rainsupreme
Copy link
Contributor

rainsupreme commented Oct 28, 2025

I think this falls under the same category as ensuring someone doesn't call hashtableNext after it's returned false, but I don't mind if it's a separate PR.

At least then let's fix the doc comments to clarify this: Other activities (like event loop, defrag, threads) that could defrag/delete the hashtable must not be allowed to run while the iterator is considered valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants