• About Blog

    What's Blog?

    A blog is a discussion or informational website published on the World Wide Web consisting of discrete, often informal diary-style text entries or posts.

  • About Cauvery Calling

    Cauvery Calling. Action Now!

    Cauvery Calling is a first of its kind campaign, setting the standard for how India’s rivers – the country’s lifelines – can be revitalized.

  • About Quinbay Publications

    Quinbay Publication

    We follow our passion for digital innovation. Our high performing team comprising of talented and committed engineers are building the future of business tech.

Wednesday, August 4, 2021

Retry or Stability Pattern

Retry or Stability Pattern Image
Photo by Brett Jordan

One of the key characteristic of microservices architecture is inter-service communication. We can split a monolithic application into multiple smaller applications called microservices. Each microservice is responsible for a single feature or domain and can be deployed, scaled, and maintained independently.

Since microservices are distributed in nature, various things can go wrong at any point of time. The network over which we access other services or services themselves can fail. There can be intermittent network connectivity errors or firewall issues. Individual services can fail due to service unavailability, coding issue, out of memory errors, deployment failure, hardware failure and etc., to make our services resilient to these failures, we adopt the retry pattern which is also known as stability pattern.

Retry Pattern

The idea behind the retry pattern is quite simple. If service A makes a call to service B and receives an unexpected response for a request, then service A will send the same request to service B again hoping to get an expected response.

Retry Pattern Image
Retry Pattern Representation

There are several retry strategies that can be applied depending on the failure type or nature of the requirements.

Immediate Retry

This strategy is the basic one. In this approach, calling service handles the unexpected failure and immediately makes the request again. This strategy can be useful for unusual failures that occur intermittently. The chances of success are high by just retrying in these cases.

Retry After Delay

In this strategy, we introduce a delay before retrying service call again, hoping that the cause of the fault would have been rectified. Retry after delay is an appropriate strategy when a request timeout occurs due to busy or failures or network-related issue.

Sliding Retry

In this strategy, the service will continue to retry the service call by adding an incremental time delays on each subsequent attempts. For example, the first retry may wait 500 MS, the second will wait 1000 MS, the third will wait 1500 MS until the retry count has not been exceeded. By adding an increasing delay, we reduce the number of retries to the service and avoid adding any additional load to a service which is already overloaded.

Retry with Exponential Backoff

In this strategy, we take the Sliding Retry strategy and ramp up the retry delay exponentially. If we started with a 500 MS delay, we would retry again after 1500 MS, then 3000 MS. Here we are trying to give the service more time to recover before we try to invoke it again.

Abort Retry

As we understand, we can't have a retry process happening forever. We need to have a threshold on the maximum number of retry attempts, we try for a failed service call. We need to maintain the counter and when it reaches the threshold value, our best strategy is to abort the retry process and let the error propagate to the calling service.

Conclusion

The retry pattern allows the calling service to retry failed attempts with a hope that the service will respond within an acceptable time.

With the varying interval between retries we provide the dependent service more time to recover and respond for our request.

It is recommend that, we need to keep a track of failed operations as it will be very useful information to find recurring errors and also the required infrastructure like thread pool, thread strategy etc.

At some point, we just need to abort the retry and we must acknowledge that the service is not responding and notify the calling service with an error.

References

Featured Post

Benefits & Best Practices of Code Review

Photo by Bochelly Code reviews are methodical assessments of code designed to identify bugs, increase code quality, and help developers lear...