-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix sharded pub/sub handling over slot migrations #2969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Todo: think about what server we use for the reconfigure - the only server we know already knows the correct topology is: the one we're receiving from. Heck, we might even just be able to resend SSUBSCRIBE to that one, and follow the -MOVED |
But what if the server sent SUNSUBSCRIBE because it's leaving the cluster entirely (or going down)? Then it might not give us a MOVED nor a good response to CLUSTER NODES. By default the reconfigure would use the dedicated "discovery" connection right? |
|
AFAIK there is no such thing as a dedicacted "discovery" connection, except perhaps in sentinel scenarios; I need to think on this - there's a timing thing here; in the migration process, the new node definitely knows, the old node knows (unless it is going offline), but the other nodes might not know yet (announcing this eagerly is explicitly documented as an optional step, otherwise it'll get the updates state "eventually"). But by definition we don't know which the new node is yet! So; unless we ask everyone, I think the old node is our best bet, and it is almost certainly still in the cluster at this point. |
… where we should be subscribing
|
@philon-msft see 41685a5; all input welcome |
|
The discovery connection I was thinking of is the one we briefly tried to eliminate years ago in #1741 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget ReleaseNotes.md :)
| private PushKind GetPushKind(in Sequence<RawResult> result, out RedisChannel channel) | ||
| { | ||
| var len = result.Length; | ||
| if (len >= 2) // always have at least the kind and the subscription channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return early for this case (for readability). Or is that less efficient than a big if() block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Any perf delta is negligible.
|
I suspect that discovery channel thing is conflating endpoints specified via the config vs endpoints from CLUSTER. If they overlap: there will be no additional. But if that connection is separate but is one of the actual nodes, we have no clue which it is - so that's just a dice roll. And if it isn't part of the active cluster, it is even less in-the-know. So: there is nothing special about that. HOWEVER: we (I) should take the time to fix that via CLUSTER SLOTS and the IP/DNS changes in v7+? v8+? |
On OSS redis, a sharded pub/sub slot migration is achieved by pushing an unsolicited
SUNSUBSCRIBEmessage, to say "stop listening". This actually breaks the pub/sub channel at the moment, as we don't anticipate that scenario.This PR fixes this, by:
SUNSUBSCRIBE, and handle it without dropping to the message processor - wipe the existing channel, and request reconfiguration (which indirectly causes resubscription when appropriate)-MOVEDwhen subscribing[FastHash]magic