- 
                Notifications
    You must be signed in to change notification settings 
- Fork 933
hashtable: add bounds check in hashtableNext iterator #2611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
e54a9cf    to
    31e5445      
    Compare
  
    | Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@             Coverage Diff              @@
##           unstable    #2611      +/-   ##
============================================
- Coverage     72.24%   72.03%   -0.22%     
============================================
  Files           127      128       +1     
  Lines         70820    71041     +221     
============================================
+ Hits          51167    51175       +8     
- Misses        19653    19866     +213     
 🚀 New features to boost your workflow:
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Uri!
Note that this is just for making it safe to call hashtableNext again when it has already returned false. It's not really needed, but it makes it easier to detect bad usage, or is it for convenience?
Isn't calling hashtableNext() after the end a bug that we want to detect? Why not do an assert instead of graceful return then?
Prevent iteration beyond hashtable bounds by checking if iterator is already at the end before proceeding with the main iteration loop. This adds defensive bounds checking for table index and bucket index. Signed-off-by: Uri Yagelnik <[email protected]>
31e5445    to
    6df1a9a      
    Compare
  
    | @uriyage let's also add unittest for this? | 
Signed-off-by: Uri Yagelnik <[email protected]>
39f159b    to
    c57878c      
    Compare
  
    | 
 Done | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| Many times I've tried to reject adding dependencies from the low-level datastructures to the whole of the server globals and everything. Dependencies should be in the other direction, from high level to lower level components. | 
| 
 We can just use  Debug assert depends on a server config. That would prevent usage of this hashtable in non-server code (valkey-cli, valkey-benchmark). | 
| @madolson According to a commment above, Uri explained why: 
 I suppose we could wait for this use case to be upstreamed and then include this change at the same time. | 
| Couldn't they then just save the state? I agree, I feel like I would rather close this and re-open if that use case becomes public. I'm not sure what that case is. | 
| #define HASHTABLE_ITER_PRIMARY_TABLE 0 | ||
| #define HASHTABLE_ITER_REHASH_TABLE 1 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These defines are unnecessary and create an inconsistency with the rest of the code. If we want to create theses (which I don't recommend) the entire file should be refactored to use them consistently.
| Can someone more clearly lay out the scenario here? I'm thinking that this is a case of: 1) creating a safe iterator on the hashtable, 2) deleting or moving the underlying table, 3) continuing to call hashtableNext. Is this right? If so, I don't think this solves the problem. With a safe iterator, it's necessary to ensure that the iterator gets deleted before the underlying hashtable is deleted. If not, when we delete the iterator, we will try to re-enable rehashing on the (deleted) hashtable. One possible way to address this is to keep a list of all "safe" iterators associated with each hashtable. When the hashtable is deleted, it could then mark the iterators as invalid. This could also be done for unsafe iterators, but I don't think that's necessary as unsafe iterators, by their nature, are created/used/deleted in a short time. | 
| I've done a review on the iterator, and these are the things that need to be addressed. There are a few potential issues with long-lived (safe) iterators. One use case for this is for defrag which iterates over hashes in an incremental manner - spanning across timer invocations. Issue 1: Rehashing isn't paused immediatelyRef: Lines 2036 to 2037 in 1cf0df9 
 At this code reference, the logic indicates that if it's a safe iterator, that once we perform the first call to  Issue 2: Empty hashtables remain in inconsistent stateRef: Lines 2042 to 2044 in 1cf0df9 
 With an empty hash table, the iterator returns false (indicating that iteration has completed).  However, index/table still remain in the initial state - as if iteration has not yet begun.  This makes it possible for an iterator to return false (indicating completion) but then later, if called again, begin returning newly added values.  This likely isn't dangerous, because the caller is unlikely to call  Issue 3: Deletion of hashtable makes  | 
| /* Advance to the next position within the bucket, or to the next | |
| * child bucket in a chain, or to the next bucket index, or to the | |
| * next table. */ | |
| iter->pos_in_bucket++; | |
| if (iter->bucket->chained && iter->pos_in_bucket >= ENTRIES_PER_BUCKET - 1) { | |
| iter->pos_in_bucket = 0; | |
| iter->bucket = getChildBucket(iter->bucket); | |
| } else if (iter->pos_in_bucket >= ENTRIES_PER_BUCKET) { | |
| /* Bucket index done. */ | |
| if (isSafe(iter)) { | 
Once the iterator has started, hashtableNext relies on both iter->bucket and iter->hashtable to remain valid.  It looks like iter->bucket is ok - it's handled internally that buckets aren't deleted if rehashing is paused.  However there's no protection for iter->hashtable.  This requires that every user of a safe iterator must be aware of hashtable deletion and avoid calling hashtableNext once the table has been deleted.
I'll suggest that the proper solution is to have a hashtable maintain a list of safe iterators.  If the hashtable gets deleted, each of the iterators should be marked as completed such that future calls to hashtableNext will simply return false, without accessing iter->hashtable.
Issue 4: Deletion of hashtable prevents deletion of the iterator
Ref:
Lines 2023 to 2024 in 1cf0df9
| void hashtableReleaseIterator(hashtableIterator *iterator) { | |
| hashtableResetIterator(iterator); | 
This is fairly serious. As it stands, every user of a safe iterator must be aware of a hashtable deletion BEFORE it happens. Once the hashtable has been deleted, it becomes impossible to safely delete the iterator. The iterator deletion attempts to unpause rehashing and accesses the (deleted) hashtable.
I'll suggest that the solution for this is identical to the proposed solution for issue 3.  Once the iterator is marked as completed, it should not attempt to access iter->hashtable at deletion time.
@zuiderkwast @rainsupreme - please review.
| That's a pretty thorough analysis Jim! I guess there are maybe a few scenarios in OSS where it feels like we might delete the hashtable while the iterator exists, though I haven't fully investigated these: 
 Adding tracking for safe iterator invalidation should be pretty lightweight in terms of performance and memory. I think we should add it, with unit tests. Also, I'm pretty sure this issue exists for dict's safe iterator as well - dict will be worth fixing if hashtable is. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should invalidate safe iterators for hashtable and dict when the underlying collection is deleted, to prevent invalid memory access and undefined states.
| One other potential issue is with defragmentation. If the hashtable is moved during defragmentation, any active safe iterators will access the deleted memory. I don't think this can happen with defrag's own iterator, but if there's a potential for other iterators, this might be a concern. (e.g. forkless save?) | 
| I think we should fix these bugs found by Jim in a separate PR. Essentially, each hashtable needs to keep track of a list of the safe iterators. (The same problems exist with dict, I imagine.) However, I don't think we allow long-lived iterators currently. Only a few operations are allowed while iterating (i.e. while the iterator exists) so letting the event loop run and later resume iteration is not currently supported, as I read these doc comments: https://github.com/valkey-io/valkey/blob/9.0/src/hashtable.c#L1960-L1965 | 
| I think this falls under the same category as ensuring someone doesn't call hashtableNext after it's returned false, but I don't mind if it's a separate PR. At least then let's fix the doc comments to clarify this: Other activities (like event loop, defrag, threads) that could defrag/delete the hashtable must not be allowed to run while the iterator is considered valid. | 
Prevent iteration beyond hashtable bounds by checking if iterator is already at the end before proceeding with the main iteration loop. This adds defensive bounds checking for table index and bucket index.