Skip to content

Commit 25f1b9a

Browse files
udidahanmsimecekckittelEd Price - MSFT
authored
Added description of failure scenarios when code uses databases and brokers together (#3609)
* Added description of failure scenarios when code uses databases and brokers together * Update mission-critical-data-platform.md * Fix link * Update mission-critical-data-platform.md * Update docs/reference-architectures/containers/aks-mission-critical/mission-critical-data-platform.md Co-authored-by: Martin Simecek <[email protected]> * Update docs/reference-architectures/containers/aks-mission-critical/mission-critical-data-platform.md Co-authored-by: Martin Simecek <[email protected]> * Update docs/reference-architectures/containers/aks-mission-critical/mission-critical-data-platform.md * Update mission-critical-data-platform.md Let's lead the RFC quote to set it up better. Finished edit pass. Co-authored-by: Martin Simecek <[email protected]> Co-authored-by: Chad Kittel <[email protected]> Co-authored-by: Ed Price - MSFT <[email protected]>
1 parent 15210cc commit 25f1b9a

File tree

1 file changed

+11
-3
lines changed

1 file changed

+11
-3
lines changed

docs/reference-architectures/containers/aks-mission-critical/mission-critical-data-platform.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Data decisions for the baseline reference architecture for a missio
44
author: msimecek
55
categories: database
66
ms.author: csiemens
7-
ms.date: 08/18/2022
7+
ms.date: 09/20/2022
88
ms.topic: conceptual
99
ms.service: architecture-center
1010
ms.subservice: reference-architecture
@@ -157,8 +157,16 @@ Azure Service Bus premium tier is the recommended solution for high-value messag
157157
- If an acknowledgment isn't received by the broker in the allotted time period, or the handler explicitly abandons the message, the exclusive lock is released. The message is then available for other consumers to process the message.
158158
- If a message is not successfully processed a configurable number of times, or the handler forwards the message to the dead-letter queue.
159159
- Because messages can potentially be processed more than one time, message handlers should be made idempotent.
160+
- In [RFC 7231](https://tools.ietf.org/html/rfc7231#section-4), the Hypertext Transfer Protocol states, "A ... method is considered _idempotent_ if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request."
161+
- One common technique of making message handling idempotent is to check a persistent store, like a database, if the message has already been processed. If it has been processed, you wouldn't run the logic to process it again.
162+
- There might be situations where the processing of the message includes database operations, specifically the insertion of new records with database-generated identifiers. New messages can be emitted to the broker, which contain those identifiers. Because there aren't distributed transactions that encompass both the database and the message broker, there can be a number of complications that can occur if the process running the code happens to fail. See the following example situations:
163+
- The code emitting the messages might run before the database transaction is committed, which is how many developers work using the [Unit of Work pattern](https://www.programmingwithwolfgang.com/repository-and-unit-of-work-pattern). Those messages can _escape_, if the failure occurs between calling the broker and asking that the database transaction be committed. As the transaction rolls back, those database-generated IDs are also undone, which leaves them available to other code that might be running at the same time. This can cause recipients of the _escaped_ messages to processes the wrong database entries, which hurts the overall consistency and correctness of your system.
164+
- If developers put the code that emits the message *after* the database transaction completes, the process can still fail between these operations (transaction commited - message sent). When that happens, the message will go through processing again, but this time the idempotence guard clause will see that it has already been processed (based on the data stored in the database). The clause will skip the message emitting code, believing that everything was done successfully last time. Downstream systems, which were expecting to receive notifications about the completed process, do not receive anything. This situation again results in an overall state of inconsistency.
165+
- The solution to the above problems involves using the [Transactional Outbox pattern](/azure/architecture/best-practices/transactional-outbox-cosmos), where the outgoing messages are stored _off to the side_, in the same transactional store as the business data. The messages are then transmitted to the message broker, when the initial message has been successfully processed.
166+
- Since many developers are unfamiliar with these problems or their solutions, in order to guarantee that these techniques are applied consistenty in a mission-critical system, we suggest you have the functionality of the outbox and the interaction with the message broker wrapped in some kind of library. All code processing and sending transactionally-significant messages should make use of that library, rather than interacting with the message broker directly.
167+
- Libraries that implement this functionality in .NET include [NServiceBus](https://docs.particular.net/nservicebus/outbox) and [MassTransit](https://masstransit-project.com/advanced/transactional-outbox.html).
160168
- To ensure that messages sent to the dead-letter queue are acted upon, the dead-letter queue should be monitored, and alerts should be set.
161-
- The system should have tooling for operators to be able to inspect, correct and resubmit messages.
169+
- The system should have tooling for operators to be able to inspect, correct, and resubmit messages.
162170

163171
### High availability and disaster recovery
164172

@@ -198,4 +206,4 @@ The health of the messaging system must be considered in the health checks for a
198206
Deploy the reference implementation to get a full understanding of the resources and their configuration used in this architecture.
199207

200208
> [!div class="nextstepaction"]
201-
> [Implementation: Mission-Critical Online](https://github.com/Azure/Mission-Critical-Online)
209+
> [Implementation: Mission-Critical Online](https://github.com/Azure/Mission-Critical-Online)

0 commit comments

Comments
 (0)