Skip to content

Conversation

@mgravell
Copy link
Collaborator

@mgravell mgravell commented Oct 9, 2025

On OSS redis, a sharded pub/sub slot migration is achieved by pushing an unsolicited SUNSUBSCRIBE message, to say "stop listening". This actually breaks the pub/sub channel at the moment, as we don't anticipate that scenario.

This PR fixes this, by:

  • checking for unsolicited SUNSUBSCRIBE, and handle it without dropping to the message processor - wipe the existing channel, and request reconfiguration (which indirectly causes resubscription when appropriate)
  • adds tests for the above, with a slot migration during the test
  • adds tests to check what happens if we get -MOVED when subscribing
  • optimizes how we check for known OOB messages using the [FastHash] magic

@mgravell
Copy link
Collaborator Author

mgravell commented Oct 9, 2025

Todo: think about what server we use for the reconfigure - the only server we know already knows the correct topology is: the one we're receiving from. Heck, we might even just be able to resend SSUBSCRIBE to that one, and follow the -MOVED

@philon-msft
Copy link
Collaborator

Todo: think about what server we use for the reconfigure - the only server we know already knows the correct topology is: the one we're receiving from. Heck, we might even just be able to resend SSUBSCRIBE to that one, and follow the -MOVED

But what if the server sent SUNSUBSCRIBE because it's leaving the cluster entirely (or going down)? Then it might not give us a MOVED nor a good response to CLUSTER NODES. By default the reconfigure would use the dedicated "discovery" connection right?

@mgravell
Copy link
Collaborator Author

mgravell commented Oct 9, 2025

AFAIK there is no such thing as a dedicacted "discovery" connection, except perhaps in sentinel scenarios; I need to think on this - there's a timing thing here; in the migration process, the new node definitely knows, the old node knows (unless it is going offline), but the other nodes might not know yet (announcing this eagerly is explicitly documented as an optional step, otherwise it'll get the updates state "eventually"). But by definition we don't know which the new node is yet! So; unless we ask everyone, I think the old node is our best bet, and it is almost certainly still in the cluster at this point.

@mgravell
Copy link
Collaborator Author

mgravell commented Oct 10, 2025

@philon-msft see 41685a5; all input welcome

@philon-msft
Copy link
Collaborator

The discovery connection I was thinking of is the one we briefly tried to eliminate years ago in #1741

Copy link
Collaborator

@philon-msft philon-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget ReleaseNotes.md :)

private PushKind GetPushKind(in Sequence<RawResult> result, out RedisChannel channel)
{
var len = result.Length;
if (len >= 2) // always have at least the kind and the subscription channel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return early for this case (for readability). Or is that less efficient than a big if() block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Any perf delta is negligible.

@mgravell
Copy link
Collaborator Author

I suspect that discovery channel thing is conflating endpoints specified via the config vs endpoints from CLUSTER. If they overlap: there will be no additional. But if that connection is separate but is one of the actual nodes, we have no clue which it is - so that's just a dice roll. And if it isn't part of the active cluster, it is even less in-the-know. So: there is nothing special about that. HOWEVER: we (I) should take the time to fix that via CLUSTER SLOTS and the IP/DNS changes in v7+? v8+?

@mgravell mgravell requested a review from philon-msft October 13, 2025 09:43
@mgravell mgravell merged commit d5b445a into main Oct 13, 2025
9 of 11 checks passed
@mgravell mgravell deleted the marc/oss_ssubscribe branch October 13, 2025 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants