This article is in continuation to the microservices services so if you hadn’t read the last article then I suggest you read that first (here). In this article, we will see the challenges we all face while developing a microservice. So let us start with a small summary of what we learned in the last article.
Microservices Definition
Microservices is just like breaking a monolithic application into several small separate applications which are interconnected with each other via API calls and each service is divided on the basis of a single feature.
It is a different way of building applications, not the traditional monolithic way. Traditional way is single code base/multiple code bases but then what gets deployed is one entity, it runs on one server and it scales as a single entity.
Fault Tolerance
When we work with microservices we all get once when services are not providing the expected response or we get any error code when using mapping methods this is known as fault in service.
Fault Tolerance in context to microservice is given an application if there is a fault, what is the impact of that fault, how much tolerance does the system have for a specific fault. If one microservice goes down what is happening to your microservices application is your whole microservice application or is there a part or functionality that goes down or is there any way to handle failure so that there is perceived impact at all. So what tolerance your system has for a particular fault is called Fault Tolerance.
Resilience
Fault tolerance always comes with resilience, it is basically how many faults a system can tolerate and it indicates how resilient it is. Apart from resilience is how much a system can bounce back from fault.
In other words, is there any method or mechanism that works fine after tolerance.
Suppose we have a main application which is making calls to another 2 microservices to get detailed response, so if either of the services goes down, the whole main application goes down so that is not fault tolerance or resilient.
Problems with Microservice
Scenario 1 - What If a microservice goes down.
There might be a case when your service or its instance goes down, so we should prepare architecture like that to handle it if the instance goes down.
In this case, we can run multiple instances on different machines or ports on the same machine and this can be done with a discovery server (eureka server) and there is a technique called Round Robin for multiple instances with clients and it also helps in balancing load on the service.
To run multiple instances
Running the war (or jar) set the property of port and profile in command line as follows
java -jar -Dserver.port=8011 -Dspring.profiles.active=production demo-0.0.1-SNAPSHOT.jar
java -jar -Dserver.port=8012 -Dspring.profiles.active=production demo-0.0.1-SNAPSHOT.jar
This way we can create two instances of the same services running on different ports and it can be checked by any discovery server.
Scenario 2 - What If a microservice is slow.
There might be a case when your service is slow and it is a bigger problem, in this case we can use the Threads concept of Java.
Suppose your main application makes a request to another microservices and that microservices are requesting data from the external instance and that instance is slow. So in this case, the main application is communication with all other microservices and if one service is slow the main application will slow down.
So it would make sense, a service that makes API calls to the service which is slow makes the application slow, but it should not happen because due to slowness of an instance of one service should not result in overall application slow.
So a solution to this problem is to use Threads.
How threads work in a web server?
Suppose we have a web server, a request comes in, it creates a new Thread to handle the request it executes and returns the response. Similarly for other threads also, this is a typical process to execute the thread.
But what if a request comes and creates a new thread and it takes some time to execute and send response, and in between a new request comes in another thread creates and waits, another request another thread and waits for response.
So what if the rate of request coming is greater than the rate of thread releasing after sending response, now we have a bunch of threads waiting and they will consume all the resources and you hit the maximum threadCount. Hence the consumer will feel the slowness in the service.
So the solution to this is by using TimeOut.
Using timeout
With timeout we will be having a solution as it is taking more time to execute. When some thread is waiting in the thread pool we use timeouts. If in that time period it does not return the response then it will throw it into the catch block and thread execution stops.
Option 1: Timeout with Rest Template
@Bean
@LoadBalanced
public RestTemplate restTemplate() {
//First way of creating timeouts
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory = new HttpComponentsClientHttpRequestFactory();
clientHttpRequestFactory.setConnectTimeout(5000);
return new RestTemplate(clientHttpRequestFactory);
}
After using this we will be getting a new Rest Template which has the property of timeout with 5 secs, so whenever we make a call rest template will wait as long as response comes back within 5 seconds it's all good else it will throw an error.
But this still does not solve the problem, because any thread which takes more than 5 seconds it gets timeout, but what if request is coming higher than what a timeout can do, let suppose we use a timeout of 5 seconds and we send request per second again we are back to the same issue, this will solves the problem little bit but not fully we gonna hit the thread limit so it’s a partial solution to the problem.
To completely solve this issue, we can check which service is slow and let it work smartly. Instead of sending requests blindly and eventually running out of resources, let me not send the request for a bit, after a bit we can check if the service is behaving as expected, then it's good business can work and if we see it is acting like the previous again stop sending the request.
This is the popular pattern for fault tolerance in microservices.
Pre Steps to ensure,
- Detect which service is slow.
- Take some temporary steps to avoid making things worse.
- Deactivate the problem component so that it doesn’t affect downstream components.
So here we should have some mechanism which breaks the way things are going and prevents things from getting worse. This is the Circuit Breaker Pattern.
Option 2: The Circuit Breaker Pattern
To understand this pattern let us take an example whenever there is a spike in electrical equipment we have something put in place which breaks the circuit. So what happens if any spike is there either by manual effort like going and turning off the circuit or automatically it resumes if everything works fine. This is the same way a circuit breaker works.
According to wikipedia, its basic function is to interrupt current flow after a fault is detected. Unlike a fuse which is once used to have to be replaced while a circuit breaker can be reset to resume normal operation.
So how can we apply or use circuit breakers in our code? We can use technically in every place where we are making a call to another API, because making a call can lead to the consumption of resources.
Circuit Breaker parameters
- Last n request to consider for decision (here n is the number of failed request)
- How many of those fail?
- Timeout duration
Let us explain it with example, consider the dummy data as below,
- Last n request to consider for decision = 5
- How many of those fail = 3
- Timeout duration = 3 seconds
- How long to wait (sleep window) = 10 seconds
Now a request to a microservice happens, consider first request completes in 100ms its a success, second request takes 4 secs it fails, third request takes 200ms its success, fourth request takes 5 secs its fails, gift request takes 5 secs it fails.
Now at this time, we are considering the last 5 requests in which we have 3 failures now so it will not send requests any more for 10 seconds i.e sleep window time.
So now the service is getting requests but due to timeout it is not sending any response, so we can handle it with the FALLBACK mechanism. It says to the system whenever a circuit breaks don’t do the usual things do whatever fallback method gives to you.
So we can achieve a fallback mechanism by;
- Throwing an error. (Not Preferred)
- Have default fallback response (Way Better)
- Save previous response in cache and use that when possible (Best Way)
Why Circuit Breakers?
So now we will know why should we use circuit breakers,
Failing Fast
In the context of microservices failing fast is a good thing and circuit breaker helps in achieving this by failing service fast instead of making things worse.
Fallback Functionality
Circuit breakers provide us with fallback method functionality.
Automatic Recovery
This is due to the sleep timeout parameter in circuit breaker pattern.
After knowing what, why and when to use the circuit breaker it's time to implement it in our code and it can be done with the help of HYSTRIX.
What is HYSTRIX?
- An open source library originally created by netflix. Netflix created a bunch of libraries, they form a part of the ecosystem of solutions for building microservices.
- Hystrix implements circuit breaker patterns so you don’t have to write network programming in your microservice.
- You just need to give the configuration parameters to get it working to get the circuit break and work it again.All these parameters we will give to hystrix and hystrix will do all the work.
- Works well with SpringBoot.
Note
Hystrix is no longer in the active development phase; it is in maintenance mode.
How to use HYSTRIX in any Spring Boot project?
Even though hystrix is a netflix project it has been integrated with Spring. So we will be going to use Hystrix in the spring boot environment, let us add hystrix to spring boot application.
Add maven dependency spring-cloud-starter-netflix-hystrix dependency.
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
<version>2.2.6.RELEASE</version>
</dependency>
Add annotation @EnableCircuitBreaker to the main application class.
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.client.circuitbreaker.EnableCircuitBreaker;
import org.springframework.cloud.netflix.eureka.EnableEurekaClient;
@SpringBootApplication
@EnableEurekaClient
@EnableCircuitBreaker
public class CircuitBreakerDemoApplication {
public static void main(String[] args) {
SpringApplication.run(CircuitBreakerDemoApplication.class, args);
}
}
Add @Hystrixcommand to the methods that need circuit breakers (any method that makes an external API call).
@HystrixCommand(fallbackMethod = "getFallbackCatalogForUserId")
public List < CatalogItem > getCatalogForUserId(@PathVariable("userId") String userId) {
//Call to Rating Service via RestTemplate
UserRating listofRating = userRatingData.getUserRating(userId);
return listofRating.getRatingList().stream().map(rating - >
//for each movieId call MovieInfo and get details
movieInfo.getCatalogItem(rating)).collect(Collectors.toList());
}
Configure Hystrix behavior i.e provide circuit breaker parameters discussed above.
import com.basic.moviecatalogservice.models.CatalogItem;
import com.basic.moviecatalogservice.models.Movie;
import com.basic.moviecatalogservice.models.Rating;
import com.basic.moviecatalogservice.models.UserRating;
import com.basic.moviecatalogservice.service.MovieInfo;
import com.basic.moviecatalogservice.service.UserRatingData;
import com.netflix.hystrix.contrib.javanica.annotation.HystrixCommand;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;
@RestController
@RequestMapping("/catalog")
public class MovieCatalogResource {
@Autowired
RestTemplate restTemplate;
@Autowired
MovieInfo movieInfo;
@Autowired
UserRatingData userRatingData;
@RequestMapping("/{userId}")
@HystrixCommand(fallbackMethod = "getFallbackCatalogForUserId")
public List < CatalogItem > getCatalogForUserId(@PathVariable("userId") String userId) {
//Call to Rating Service via RestTemplate
UserRating listofRating = userRatingData.getUserRating(userId);
return listofRating.getRatingList().stream().map(rating - >
//for each movieId call MovieInfo and get details
movieInfo.getCatalogItem(rating)).collect(Collectors.toList());
}
//FallBack method for getCatlogForUSerId
public List < CatalogItem > getFallbackCatalogForUserId(@PathVariable("userId") String userId) {
return Arrays.asList(new CatalogItem("No Movie", "", 0));
}
}
Here what we are telling hystrix is whenever the circuit breaks then call the fallback method. Remember, the signature of the fallback method should be the same as the signature of the method for which we are creating the fallback method.
So whenever the circuit breaks you will see the fallback method response instead of actual response.
How does HYSTRIX work?
Suppose we have an API class you have a method inside class which we annotated with @HystrixCommand so this means it will provide circuit breaking to this method so this will give a fallback method.
Here, Hystrix actually wraps your API class in a proxy class, so whenever we want any instance of this class it will not give the instance of the proxy class in which hystrix is created and wrapped around it. So now the proxy class will contain the circuit breaker logic, when somebody is making a call hystrix is constantly monitoring what's returning back and it takes the call and returns the response to the actual method and examines the result and if the results fails so it checks do I have the fallback. So it consistently examines the result and realizes that it needs a circuit breaker and redirects it to the fallback method.
The annotation sticks there and it creates the proxy logic which handles everything, so it looks to the parameters that you set in that annotation to decide when a circuit breaks, when a circuit breaks, what fallback method to call. So this is how hystrix handles all its proxy class and this wrapper class which contains the circuit breaker logic.
So this is how we can use Hystrix to use circuit breaker patterns in our Spring Boot projects.
Summary
Along with this we came to the end of this article, but before finishing up let us summarize what we learned in this article. We looked inside the challenges we face in developing or working with any microservice application. Understand fault tolerance and resilience and solution to this problem with circuit breaker pattern and implementing it with Hystrix.
What did we learn?
- What are microservices?
- Fault Tolerance
- Resilience
- Circuit Breaker Pattern
- Hystrix, when, what and how to use.