Last Updated on 30/05/2021 by Patryk Bandurski
Our APIs should be fault-tolerant. We have many strategies to cope with this. Previously I described a possible scenario with a circuit breaker pattern. Today I remind you of another, simpler way. Retry! We can retry the operation a couple of times before using a different fault tolerance pattern or stopping processing the message. Let’s see how it works in mule.
Our case is plain and simple. We have an API that serves asynchronous operations. That means that an immediate response with 202 Accept status is returned once the user calls our endpoint. In the meantime, the request is being processed in the integration layer. Unfortunately, our service exposed on AWS infrastructure has some issues. As a result, our callout ends with an HTTP 500 error response.
However, we have implemented a retry mechanism. Our API will try to call the AWS service three times. If we reach the threshold, we stop processing the request. We can, for example, write it to DLQ for later processing.
The retry mechanism is used in scenarios related to transient failures. What are transient failures? These are communication issues to external services. I mean service unavailability or any other problem in the service like server overload. It is an ephemeral state – temporary. In other words, it is highly likely that when we call the service again, it should work.
You should be aware that not every operation is a good fit for the retry pattern. In REST APIs, we should repeat only idempotent HTTP methods such as GET, HEAD, OPTIONS, PUT and DELETE. For SOAP APIs, it is more complicated, and it depends on the operation’s logic.
REST API error handling is pretty straightforward due to HTTP status groups. Below you can see all five groups. It makes sense to retry server errors 5xx. The rest of the status codes do not indicate errors that should be retried … maybe except HTTP 429 Too Many Requests related to throttling.
|Status Code group||Description|
The last consideration you should take is the retry timeout. Too short, can impede the time that service needs for recovery.
One of the core components, called Until Successful, can be used for retrying the request. Scope configuration is pretty simple. To configure the scope, you need to provide
- number of attempts (line 2)
- number of milliseconds between retries (line 3)
Down below is a snippet from the application. It is configured to retry the request to OpenWeather three times in case of any error. The mule engine will wait ten seconds before retrying the operation.
<until-successful doc:name="Until Successful" maxRetries="3" millisBetweenRetries="10000"> <http:request method="GET" doc:name="Request" config-ref="OpenWeather_configuration" path="/weather?q=London" /> </until-successful>
The retires are made in an even interval. Until Successful scope treats any error as a trigger for retrying, this may not be a correct assumption. As I wrote earlier, some errors are, for example, client-facing, and there is no use redoing them. For instance, the API secret key is expired. We won’t fix the situation by repeating the request with the obsolete API key. The same is for transient failures.
Until Successful with the Try scope
To exclude non-transient failures and client-facing errors from being retried, I put Try scope with On Error Continue strategy. I configured the error handling strategy to accepts only 4xx errors like bad request, unauthorized, forbidden, etc.
<until-successful doc:name="Until Successful" maxRetries="3" millisBetweenRetries="10000"> <try doc:name="Try"> <http:request method="GET" config-ref="OpenWeather_configuration" path="/weather?q=London" /> <error-handler > <on-error-continue doc:name="On Error Continue" type="HTTP:BAD_REQUEST, HTTP:CLIENT_SECURITY, HTTP:FORBIDDEN, HTTP:METHOD_NOT_ALLOWED, HTTP:NOT_ACCEPTABLE, HTTP:NOT_FOUND, HTTP:UNAUTHORIZED"> <logger level="WARN" doc:name="Skip" message="Skip retry"/> </on-error-continue> </error-handler> </try> </until-successful>
The Try scope in Until Successful scope allows us to exclude some errors from being retried. Using this approach, we can skip both non-transient and client errors. As a result, our retry mechanism is specific only to cases when it makes sense to call the service again.
Marriage of Until Successful and Try scopes can introduce a simple retry mechanism. We can use Try scope to exclude some errors from being retried. We should retry transient and client errors. Of course, you should think closely about setting up intervals between reattempts and the maximum number of attempts. Remember as well to use retry only for idempotent methods.
This idea has some disadvantages. You can imagine that the service you are trying to call is experiencing an overload issue. When we try to retry our calls, we can cause service to be more overloaded and outage to be even longer. To overcome this, you should think about the intervals or use another strategy like exponential back-off. However, the latter one is not that easy to be implemented in MuleSoft. However, I will cope with this in one of the next articles :).