-
Notifications
You must be signed in to change notification settings - Fork 927
Description
Scan is supposed to provide the following guarantees (as per https://redis.io/docs/manual/keyspace/):
A full iteration always retrieves all the elements that were present in the collection from the start to the end of a full iteration. This means that if a given element is inside the collection when an iteration is started, and is still there when an iteration terminates, then at some point SCAN returned it to the user.
A full iteration never returns any element that was NOT present in the collection from the start to the end of a full iteration. So if an element was removed before the start of an iteration, and is never added back to the collection for all the time an iteration lasts, SCAN ensures that this element will never be returned.
While playing around with some PoC for cluster wide scan, I realized that the first guarantee can only hold as long as you were connected to the same node during the entire duration of the scanning. If failover occurs, the replica may have a different seed value for the siphash function, so the cursor previously used on the primary would not be in the same place. This is likely less of an issue for SCAN since you normally indicate the node you're talking to, but many CME client transparently handle re-directs for SSCAN/HSCAN/ZSCAN during failovers.
I'm not sure we need to strictly do anything about SCAN, I'm not sure how often SCAN is resumed after a failover. The other commands might though.
I think this is worth addressing, but could be done in three ways:
Simply update the documentation to add the caveat. This doesn't feel right to me, because most users will not really be aware of this.
Add a configuration so that seeds can be set externally. This would allow operators to configure this consistency, but has limited benefit for those that don't know.
Allow replicas to sync their data seed from their primaries. This makes their cursors consistent. We can also persistent this into RDB so that it is still accurate, I'm most in-favor of this.
ref: redis/redis#12440
We never came up with a cohesive answer here