Skip to content

Add support for cross-topic subscriptions. #511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marcselis opened this issue Mar 17, 2022 · 16 comments
Closed

Add support for cross-topic subscriptions. #511

marcselis opened this issue Mar 17, 2022 · 16 comments

Comments

@marcselis
Copy link

We would like to have it possible for an endpoint to subscribe to events that are published in another topic than the topic the endpoint is using.

Motivation:
We do not have a dedicated NSB Operations team in our company, as we have a number of independent devops squads, that monitor their own NServiceBus endpoints.
As our business is dealing with highly sensitive (personal & wages) data and ServiceControl & ServicePulse are lacking fine grained security which could limit what endpoints & (failed)messages a user can see and act upon, we can not use a single ServiceControl, Audit & Monitor set-up to monitor all NSB endpoints. (See also ServicePulse issue #453)
At the moment we standardized on SQL Transport, and each of our squads have a dedicated schema in a single central NSB database, and in each schema, a dedicated ServiceControl, Audit & Monitor instance is installed that the squad uses to monitor its own endpoints.
Cross squad communication is easy in SQL Transport:

  • We give all our endpoints' db users rights to write to all tables in all schemas.
  • We only allow an endpoint's db user to read in its own schema.
  • You can tell the SQL transport in what schema a destination endpoint is located (using transport.UseSchemaForEndpoint()) and you can send messages to its queue.
  • For subscriptions, there is a single SubscriptionRouting table that is located in the dbo schema that all endpoints use, allowing an endpoint in one schema to subscribe to messages in another schema.

We now have a number of new applications that are hosted in Azure that want to use the Azure Service Bus transport. So we would like to replicate our current SQL set-up to Azure Service Bus. We discussed this with Particular Support (see support case #00063703) and together we came up with a single Azure ServiceBus namespace with a separate topic for each of the squads, again having its own ServiceControl, Audit & Monitoring instance for each squad to monitor its own endpoints.
This set-up works more or less:

  • sending messages to endpoints from another squad works fine, as all queues reside in the same namespace.
  • however, subscribing to events published in another topic is not possible through the ASB Transport.

All of our squads use a in-house built framework to configure NSB and in there we managed to write code to create cross-topic subscribtions directly in ASB using the ASB client that is referenced in the NSB ASB Transport, but this is not very sustainable. For example, in the latest version of the NSB ASB Transport, you switched from the Microsoft.Azure.ServiceBus package to Azure.Messaging.ServiceBus, which broke our code when upgrading to the latest version of the ASB Transport.

But more important: we are also trying to set-up a router to allow communication between the on-premise SQL transport and the ASB transport and there we are totally stuck:

  • Sending & receiving messages from any SQL endpoint to any ASB endpoint works fine, as they all reside in the same namespace.
  • As the router uses this ASB transport, it also not able to create subscriptions in other topics than its own.
@SzymonPobiega
Copy link
Member

Hi @marcselis

If I understand correctly it is possible to achieve your goal with the current version of the router. Here is a draft PR that modifies the SQL Switch sample to route between ASB topics within the same namespace and a SQL transport. It uses a single instance of the Router with three interfaces.

@marcselis
Copy link
Author

ok, but that still leaves the problem that an endpoint in an ASB namespace cannot subscribe to messages published by another endpoint in the same ASB namespace but using another topic to publish its messages.
Technically this is perfectly possible, as in the ASB portal there is no "link" between a queue and a topic. You just create a subscription in on a topic and configure it to forward to any queue you want.

@SzymonPobiega
Copy link
Member

ok, but that still leaves the problem that an endpoint in an ASB namespace cannot subscribe to messages published by another endpoint in the same ASB namespace but using another topic to publish its messages.

That's what that sample solves, via the router.

@marcselis
Copy link
Author

To me that feels like using a canon to shoot a fly. As I explained in the router issue, we currently have 3 or 4 topics in the same namespace, but that number will organically grow to 20+ in the next few years. Everytime a new topic is added, we would need to change, test & redeploy our router to support routing messages between the new and all other topics.
All to get something to work that already works when using ASB directly:
If I go to the Azure portal and manually create a new subscription in topic A and point it to the receiving queue of an endpoint that is using topic B, it works. No router required. It would be really nice if the NSB ASB transport could support that out of the box, so that we don't need to create the subscriptions manually and use the router for what it does best: routing between different transports.
I'm a huge fan of NServiceBus, but I'm having a hard time convincing my colleagues to use it, instead of directly using the ASB API as they currently do. They only see what it can't do, and not (yet) the huge benefits they get in return...

@SzymonPobiega
Copy link
Member

I see your point. I looked at the command line tool for the ASB transport and it seems that it would be possible to use it achieve your goal.

The subscribe command implemented in this class allows to pass the name of the topic. With this method you can subscribe an endpoint to an event published on a different topic that endpoint's own topic.

We recommend to always script the subscriptions for production deployment and not rely on the auto-subscribe capability so that should not add any complexity to your deployment. The reason for this recommendations is the unpredictable nature of auto-subscribe.

@marcselis
Copy link
Author

Thanks, that will definitely help.

I've read the recommendation of not relying on the auto-subscribe & auto queue creation capabilities, but so far we have been using it for the deployment of all of our endpoints without any problems.
I can relate to the fact that it makes things more explicit and is more secure as your endpoints don't need admin rights to create tables or queues. But it complicates the deploy a lot in the sense that in order to deploy an endpoint correctly, you also need to know if the endpoint is processing new event types, because then you also also need to create the new subscriptions.
Failing to do so won't crash your endpoint, it will just not receive those new events.
And that is much harder to detect! It is very difficult to find out what events have been missed and get them republished without any consequences for other subscribers that did process them.
That is the main reason why we still rely on the auto* capabilities.

@SzymonPobiega
Copy link
Member

@marcselis I understand. I labeled it as a candidate for future enhancement release. That does not mean that we'll handle it in the very next minor release -- this is still a subject for prioritization done by the team that works on the release.

Regarding the missing events, I think this topic is also very interesting. If I understand you correctly, the missing thing is a mechanism that would ensure that every subscriber that needs to process the event is subscribed when the publisher starts publishing the events. Let me validate if I understand the scenario correctly.

Suppose there is an endpoint that processes DoSomething commands and updates it database. Let's call it A. At some point in time the team responsible for the endpoint A decides that it grew to big and wants to split the responsibility for the DoSomething command into three parts:

  • the A is going to do one part and publish SomethingAlmostDone event
  • new endpoints B and C are going to subscribe to SomethingAlmostDone event and conduct their pieces of business logic

From the outside the work that needs to be done did not change, it has just been split into three parts.

So in that case when the change is going to be deployed, I need:

  • Deploy B and C endpoints
  • Make sure they are subscribed to SomethingAlmostDone event (although there is no publisher of that event yet - the old version if A continues to do the whole work)
  • Shut down A and deploy the new version of A binaries that only to part of the work and publish the event
  • Start A again and begin processing according to the new rules

Now, if the second step (subscription) failed silently then either B or C or both will skip some events and it will be very hard to publish them retroactively.

Does my description reflect well the reality of the problems you are facing?

@marcselis
Copy link
Author

That is a good example, but it doesn't have to be that complicated.

Let me give you another example:
We need to send a lot of declarations to the government. That is done by uploading xml files to a dedicated sftp location. The government then processes these files and drops a response in another sftp location where we need to fetch it.
We have created a central process to send the files and receives all responses. That central process downloads all files that are in our inbox folder and publishes an event for each. We have different subscribers that examine the file (name and contents) to check whether the response is for them. (Sometimes 1 response file needs to be processed by multiple processes).

Suppose we have a service running that due to changed legislation suddenly needs to send a declaration to the government and process its response. We do this by deploying a new version that will now create the xml file on a central place and send a command to the central service to upload that file to the government. But of course it also needs to subscribe to the events that get published by the central component that indicate that a file was downloaded.

  • Using auto-subscribe this works out of the box: once the pull-request with the changes is approved, a deploy pipeline is triggered that deploys the new version. When the version starts it creates the missing subscription for the IDownloadedAFile event and sees the response when it arrives.
  • If you manage the subscriptions in the deploy pipeline, one must not forget to alter the pipeline to create the new subscription before approving the pull request. Otherwise the endpoint is automatically deployed, sends out the command to upload the file to the government, but misses the event that contained the response, because there was no subscription created.

We receive 100K files per day from the government. So you can imagine the mess finding out the exact events that were missed and resend those to that single endpoint that needed them.

IMO it would be best to split the auto-subscription mechanism in 2 parts:

  1. detection of eventhandlers and missing subscriptions for them and
  2. creation of the missing subscriptions.

Part 1 should always run and fail if it detects a missing subscription and part 2 is disabled.
That would make the newly deployed endpoint crash when the pipeline didn't create the needed subscription, before it got the chance of sending out a file, and no events are lost.

@ramonsmits
Copy link
Member

ramonsmits commented Jun 21, 2024

Topic per event topology

A very simple variation would be to use a topic per event. Yes, this has some potential issues like not supporting inheritance and the number of topics is limited but likely not an issue for most customers.

That is also solvable by grouping events into a single topic. For example, per assembly which could represent all events published by a service boundary or a specific component. It could also just use a "correlation filter" instead of the "SQL filter" and result in improved performance.

Strict validation on not allowing inheritance

NServiceBus could for example have a default validation to only allow events to be published that are

  • sealed
  • not have public interfaces (private is allowed, can help with partition keys, and multi-handlers
  • do not use inheritance

Message examples:

// ✅
public sealed record MyEvent : IPrivate
{
}

// ❌
public record MyBase
{
}

// ❌
public sealed record MyOtherEvent : MyBase
{
}

Potentially users could opt-out of the validation with a "trust me I know what I'm doing" type of method.

Topic grouping

Grouping events on the same topic based on a attribute like

[NServiceBus.Transport.AzureServiceBus.Topic("Sales")]
public sealed record MyEvent : IPrivate
{
}

or convention:

// Use assembly name
transport.TopicProvider = (type) => type.Assembly..GetName().Name;
// .. or namespace
transport.TopicProvider = (type) => type.Namespace;

Could even allow for multiple topics for the same message although it likely would not be recommended

[NServiceBus.Transport.AzureServiceBus.Topic("Sales")]
[NServiceBus.Transport.AzureServiceBus.Topic("Finance")]
public sealed record MyEvent : IPrivate
{
}

or convention:

// Use assembly name
transport.Advanced.MultiTopicProvider = (type) => new []{ type.Assembly.GetName().Name, type.Namespace };

@SzymonPobiega
Copy link
Member

Hey @marcselis

Regarding subscribing to different topics, have you managed to solve that problem somehow? We are about to release a new version of the transport that uses topic-per-event topology that should be better suited to your architecture. Given services A and B, if A publishes event Event1, you could use a convention to name the topic to publish to A.Event1. Then B would use that name to subscribe to these events.

@marcselis
Copy link
Author

No, besides the solution I already described (extending the NSB packages to add the subscriptions ourselves) and using the command-line tools to create the subscription in the pipelines. As a matter of fact, because of the problems I mentioned in this issue, combined with some performance issues we experienced with the ASB SQL Filter, we decided to remove ASB from the list of recommended transports to use with NServiceBus.
We do have a some endpoints running in ASB (20-30 in production), but most of them don't use NSB and those who do (2), do not need cross-topic subscriptions.
But we will investigate this new topic-per-event topology one it is released.

Is there any progress in detecting missing subscriptions when autosubscribe is off? This is a broader problem than just the ASB transport, though.

@SzymonPobiega
Copy link
Member

Is there any progress in detecting missing subscriptions when autosubscribe is off? This is a broader problem than just the ASB transport, though

Let me verify if I understand the problem correctly. The issue is caused by the new endpoint sending a command but missing the published response. The command is sent based on some data that the endpoint has access to that is stored in permanent medium (does not go away).

So if the goal is to prevent the endpoint from "consuming" that data and sending a command based that data when the subscription is not present, right?

@marcselis
Copy link
Author

marcselis commented Feb 21, 2025

Not necessarily. The scenario is as follows: I have an (existing or new) endpoint that has a new handler to process a certain type of event. My endpoint has autosubscribe disabled, according to your best practices. If I then forget to create that new subscription in my deployment pipeline, the endpoint will be deployed and will run hapily, but it will never receive those events.

It can take a long time before this problem is detected. Operations will not see it, because everything is running fine and there are no errors. So, by the time 2nd or 3rd-line support start investigating the ticket that their product is not doing what it is supposed to do, the endpoint has already missed a lot of events that it was supposed to process. And then the support team needs to manually recreate these events that were missed and place them in the endpoint's input queue, which usually takes a lot of effort.

It would be nice to have some "validation" mechanism during endpoint start-up that detects all event handlers in the endpoint and issues a warning, or even a fatal error, when a subscription for an event can not be found and autosubscribe is disabled.

As said in my previous message, this more a "functional" problem that is not depending on the transport being used.

@SzymonPobiega
Copy link
Member

Thanks for the clarification. I was asking because we found it difficult to implement when spiking. We assumed that people who disable autosubscribe do it because they have strict security policy that prevents running their endpoints with high privileges.

If that is the case, at least with Azure Service Bus, the endpoint is not able to read any information about broker topology, including if a subscription exists. So it seems to be a all-or-nothing case with ASB. Either the endpoint can read/modify anything (in which case autosubscribe is the obvious option) or it can't know nothing about broker.

I looked at the NServiceBus and ASB transport code and the potential improvement could be add a switch to the transport to not create any entities but just verify if they exist (that would still require namespace management rights). Would that solve your issue?

@marcselis
Copy link
Author

marcselis commented Feb 27, 2025

That would already be an improvement that could help us starting to improve the security of our ASB setup. But in the end our internal audit team really wants to manage the subscriptions centrally and make it impossible for inventive developers to (for example) add subscriptions to events that are published about changes in wages of colleagues or friends, etc...

I've created a feature request in the Azure SDK repo to make checking existence of ASB resources possible without management rights.

@SzymonPobiega
Copy link
Member

@marcselis I extracted the subscription validation aspect of this issue here Particular/NServiceBus#7311. Could you validate if I captured the problem accurately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants