Distributed nature of Microservices and things we should be wary of!
We are living in a digital world where most of the businesses small, medium, or an enterprise-class have their presence on the web. The key to success in today’s fast-paced, highly competitive environment is to stay ahead of others by having a captivating user experience that is delivered by performant components running under the hood. On top of this, these requests can originate from varieties of devices like a desktop, a laptop, a tablet, a smartphone, and or a smartwatch, etc.
How to be relevant and competitive?
It’s not a distant past when we use to think of ‘everything’ as part of a big monolith system that was designated to serve everything (literally everything) that landed on it; our imagination didn’t take us beyond the apparent practice of slicing the monolith into logical partitions of Presentation, Business, and Persistence layers. But these Monoliths could not withstand the test of time, and that led the technocrats to think of the system that can sustain the ever-emerging demands of:
- Scalability
- Availability
- Reliability
- Time to market
- Adopting new technologies
Microservices comes to the rescue!
The idea here is to break the big Monolith into smaller inter-related domains; how that can be done however can be based on various factors including how we want our domains to be in sync with our business functions. This makes our lives a lot easier, as technical teams can take advantage of having a dedicated set of members, a preferred technical stack, decentralized control, and scaling domains independently of others.
As said before, these domains need to work in synergy with each other and together should serve the entire ecosystem. The big question being: How to handle requests that span multiple domains? Should there be areas teams need to be aware of as they start? Are there fallacies around distributed architectures, well Yes!, there are few, have a look at this page? This brings us to the heart of this article.
Distributed nature of Microservices
When the whole is broken into multiple logical components more often than not, the call flow spans multiple such components or microservices. In this article we will not delve deep into what should be the ideal size of a microservice, what are the boundaries it should adhere to; we will assume that we have a system that consists of multiple such services, and they together are serving as a whole. Enough of theory, let’s put some pragmatic context to all that we have been talking about thus far.
Synchronous Integration
Let’s say there’s a web application which has an Orchestrator service that routes the requests to Service A and Service B, here’s how we can see it:
As we can see, in order to process a request originated from the Actor, the Orchestrator has to call Service A followed by Service B. Till the time, things (network, infra, etc.) are working fine, everyone is happy and merrier. But just like we have an off day, these things do have an off day and if that happens then we are in big trouble. Let’s say, the mutation in Service A was successful, and while the call to Service B broke. The consistency of the overall system would go for a toss.
Fallback option(s)
The easiest thing that one can do is to have a re-try mechanism in place that can try repeating the execution on the failed service (obviously that can be either for configurable number of times and that too in an exponential interval), but for this to be successful, one needs to ensure that all the endpoints on the services are idempotent and are not resulting into an inconsistent state when invoked multiple times (this is easier said than done, as the design and implementation team needs to keep this in the back of their minds while doing all the modeling)
Another option that comes in handy is to have a Rollback/Compensation mechanism (Saga Pattern) in place wherein the Orchestrator should un-mutate all that was mutated as part of a happy flow. This is also a painstaking exercise as the service where the mutation has happened, needs to maintain its prior state or needs to know how it can go back to its previous state, if and when needed.
Asynchronous Integration
Now let’s talk about another common way of integration that we see in a microservice world.
As can be seen in the above diagram, Service A is publishing an event on a Topic created on a Message Broker. Service A however waits for the Ack from the broker, before considering it as a done deal, i.e. generating a response back to the caller. Now the problem with this approach is not drastically different from the Synchronous Integration, that we saw earlier. In this, let’s say Service B after polling messages from the broker, fails to process them successfully, that means we have an out-of-order Service A and Service B, leading us to solve a potentially similar problem.
Fallback option(s)
In any integration approach, we have to focus on the potential areas of concern. In this case, if the request fails on Service A itself before it mutates its database, it’s not going to majorly impact the system, however, if Service A was successfully mutated but the mutation on Service B was a failure, that’s something to worry about.
Tricks like manual commit can come in handy wherein we need to play with the lifeline of the messages in the broker. What this means is, the broker would hold on to the messages till the time Service B sends a manual commit back to the broker, informing it to increase its offset and take down the message from its persistent store. Again we need to ensure that the subscribers, in this case, Service B is idempotent and can handle repeated requests.
Is this the best we can do? In comes to ‘Transactional Outbox Pattern’
What is Transactional Outbox Pattern and how can it be beneficial?
The details of the Pattern can be found on microservices.io. But at a high level, the approach includes having some kind of store for keeping all the events that a given Microservice would be publishing upon some activity. The transaction on the Service should encompass the mutation of the database along with persisting these messages in the database. Once both these operations are successful, the transaction can be marked successful. Let’s see how that looks like
As can be seen above, the main transaction ends as soon as Service A is mutated and the messages that need to be sent as part of notification are persisted in the database. What this means is, we need some async component/relayer which can Publish these messages to whosoever is interested in receiving these messages. In our example, it would be Service B. The responsibility of this relayer would be to ensure that the messages are published to the message broker even if it is Not Available at the time when the messages were first attempted by the relayer. Till the time messages are not relayed successfully to the message broker, they will continue to remain in Service A’s database. We also need to ensure that Service B follows the manual commit strategy so that it can reuse the persistence capabilities of the broker just in case it is not able to process the messages due to a potential problem in the code, infra, or due to some other reason.
As long as the intent of the application is just to send notifications to Service B of any activity that happens within Service A, we should be alright in following this approach as the relayer would ensure the system is eventually consistent, but if we are looking for strongly consistent nature of the application, then we have to consider Synchronous Integration with the Fallback option(s) described above.
Conclusion
Alright, all that being said and discussed we know that there are some benefits that we can reap by using Microservices-based architecture, but we need to be double sure of what the boundaries of microservices are and most importantly they are in alignment with the Business domains. We also should strive for making our services less chatty and last but not least try to avoid distributed transactions as much as possible, if not, try to leverage some of the best practices that are outlined above in the Fallback option(s) of various integration approaches.