Last Updated on 27/03/2021 by Patryk Bandurski
It is a good practice to monitor your service and check whether it is available and/or is performing as expected. To do this, we need to specify what the service’s health term means. In this article, I will present two different definitions. However, keep in mind that you can have your own project’s specific definition. All examples are prepared in Mule ESB 4.2. If you are familiar with Spring Boot Actuator, you should see some interface similarities. I have decided to use the Spring approach as it is clear and easy to read.
Service’s health
To efficiently monitor your services, a set of service’s health conditions should be chosen. It may be a universal list, but it may as well be tailored to your project’s specifications. Here are a couple of ideas that you can use:
- has service started?
- has a service reachable endpoint?
- has runtime created anchor file?
- has the service successfully established a connection with another system via HTTP or any other protocol?
- has the service established a connection within the threshold with another system?
- etc
As you can see, this is just the beginning, and it can be extended as needed. Keep in mind that health checks should be quick and simple and not too complex as it may lead to difficult maintenance. I decided to present two approaches. First one will be entirely focused on the question of whether my service has started. The second one will be more sophisticated as I would expect to see if my service has established a connection within an acceptable threshold.
Anchor file
Health check condition: has our service been deployed?
We have a couple of ways to check if the service is running in Mule ESB. First of all, we may look at mule-ee.log. After Mule service starts, you should see the table with application start-up statuses in the log file. As in the screenshot below. We can tell that the health-check application from the default domain has been DEPLOYED. Mule will set it to FAILED in case of any error.
Mule run-time creates file [application name]-anchor.txt when the service is deployed correctly. Note that extension txt will exist for both Windows and Linux systems. In this scenario, we need to look for file existence within the apps directory. Using the previous example, I would look for health-check-anchor.txt. If my monitoring tool does not find this file, I should receive an alert that something went wrong.
Endpoint’s health
Spring Boot Actuator
While I was implementing microservices using Spring Boot, I have encounter the Spring Boot Actuator library. This library enabled a couple of simple endpoints. The most important for me was /health and /info. The first one, shown below, allowed me to check my application’s status easily. As you can see, although configService and hystrix are marked as UP overall status is DOWN. This means that some other conditions did not evaluate correctly.
Simple health check
Health check condition: Has our service been deployed? Does the service run?
How can we achieve that scenario? Mule does not have something like a health endpoint, allowing us to check whether the service is running or not. I think that the easiest way would be to enable HTTP listener on specific URI like /health. Under this address, we should receive clear status’ information. Like in the diagram below, this can be as simple as always returning status UP by service with 200 HTTP status code.
If I cannot reach /health endpoint, I know promptly that something is wrong with my service. On the other hand, if I receive any response, I will mark my service as running and working as expected. Let’s see something more complex.
Complex health check
Health check condition: Has our service been deployed? Does the service run? Has service established a connection with the external system withing a defined timeout threshold?
In comparison with the previous simple health check, here we have higher expectations towards our service. We expect the service to connect with an external system through HTTP protocol or query DB using a simple select statement. What is more, we may require some timeout threshold to be met. The diagram below depicts a simple process.
In the presented example, we are performing in parallel three different checks. Two external HTTP calls and one DB call. For each call, we perform custom status verification. For HTTP call, it could be check if 200 or 201 HTTP status code has been returned. After all, steps have been performed, we compute the overall service status. Usually, if one of the calls is marked as DOWN, service status is also reflected as DOWN. The most complex part here is “Verify status” and “Compute status”. In these two actions, you can put as much custom login as you need.
HTTP status code
If you decide to expose the service’s status using the rest endpoint, you should also consider changing returned HTTP status. It is a good practice to return 200 code for a status UP and 503 in case of status DOWN. Why? 200 means ok, and I reckon that DOWN status is definitely not ok. Most of all client code will notice that 5xx code occurred, and this is an exceptional situation that requires some action.
Implementation
After this brief introduction to the service’ health status, it is time to see implementation in Mule ESB. I have prepared one application that has /health endpoint exposed. This endpoint only accents GET requests and return content in JSON.
Simple scenario
The first and easiest is always to return UP status. As you can see, we perform this in three steps. We could do it in only one step. However, I decided to have a more generic flow. In consequence, only the first message processor will change—more about this in the next section.
What this flow actually does is to set the status to successful. After calling GET /health we should always receive:
{ "status": "UP" }
This solution is fairly simple, but it may fill your needs. If you have more sophisticated requirements like checking if we have established a connection or we get a response within specified time boundaries, go to the next section.
Verifying connection
Flow health-status-flow is far more complex. First of all, we get a scatter-gather that calls two private flows concurrently. The next two steps are similar to what you already saw. That is computing status and preparing a final response.
I am expecting structure like in the example below:
{
"status":"DOWN",
"details":[
{
"serviceType":"http",
"status":"DOWN",
"errorCode":"THRESHOLD BREACHED",
"statusCode":200
},
{
"serviceType":"db",
"status":"DOWN",
"errorCode":"CONNECTIVIT"
}
]
}
In comparison to the previous example, now I have details arrays. Each item is a specific health check. For this particular example
- getting response took longer than expected
- connection to database did not work due to connectivity issues.
As a result, the overall status is DOWN.
Connecting to the HTTP endpoint
The flow that checks health is performing requests than computing status. Logic is fairly simple. If the HTTP response status code is 200, then the service’s status is UP. Mule ESB by default would throw an exception for codes greater or equal to 400. We need to suppress this behavior. To treat any status code as a success, we need to configure HTTP Request’ response validator like below:
<http:response-validator >
<http:success-status-code-validator values="100...599" />
</http:response-validator>
Why I have decided on range from 100 to 599? Because this is a standard and I should not receive anything outside this range.
If you are not up to date with the newest match and if DataWeave syntax, you may find it useful reading the article DataWeave – Tip #1. To keep it short following transformation, set variable status. DataWeave engine adds errorCode and statusCode properties when status equals “DOWN”.
%dw 2.0
output application/java
---
{
serviceHealth: {
serviceType: "http",
(using (status = if (vars.service.statusCode == 200) "UP" else "DOWN") {
status: status,
(status match {
case "DOWN" -> {
errorCode: vars.service.reasonPhrase,
statusCode: vars.service.statusCode
}
else -> {}
})
})
}
}
Timeout threshold
We may also extend conditions and expect to receive a response within a specified time range. Both conditions should be fulfilled to consider status as running:
- HTTP response status code is 200
- Connection time, less then defined threshold (if the threshold is specified)
In the case of a breached threshold, I would like to provide an error code. Here is the excerpt from transformation:
...
errorCode: vars.service.reasonPhrase match {
case met if thresholdMet -> $ else -> "THRESHOLD BREACHED"
},
...
Connecting to DB
How can we check Database health? In Mule ESB, we need to use Try block to handle all exceptions during the call to the database. We can use On Error Continue to continue our flow. Then in Transform Message, we check whether we received an error during the call and set the status appropriately.
Source Code
Source Code is available at my GitHub account here. If you have any comments or questions regarding the code don’t hesitate to write to me.
Summary
To check if the Mule ESB service has been deployed correctly, we can use anchor files. In advanced scenarios where conditions are much more complex, it is worth exposing/health endpoint that would inform about the service’s status. We can define a threshold, we can perform simple calls to DB, etc. It is totally up to you and your requirements. Bear in mind that checks should not be too complex as they may become too cumbersome.
If you find this article interesting please share it. Cheers 🙂
Hm, i tried to find the sources at github, but without success, could you provide a link up there?
Hi, I have just totally forgotten to post the source code to Github. Now it should be accessible. Please visit the following link https://github.com/dyeeye/health-check. Sorry about that.
Hi !
Thanks for the information, but I can’t found the url to code? Could you share us the github repository? Thank you !
Hi, I have just totally forgotten to post the source code to Github. Now it should be accessible. Please visit the following link https://github.com/dyeeye/health-check. Sorry about that.