The article shows an evolution of standard JSON APIs based on the author’s experience. The solution is based on the Avro format, which significantly reduces time and network traffic in communication between microservices. The article includes the results of benchmarks comparing JSON API with variations of Avro APIs and implementation details in C# .NET Core.
Introduction
I am a software developer working in C# .NET environment, focused mostly on the backend side of the applications. That means I am delivering the data. Fetching the data. Synchronizing the data. Downloading the data. Checking data quality. Pulling the data. Mixing together data from various sources to provide new data. I think you know what I am talking about.
Fortunately, I am living in a microservice world, where the data is well organized. The flagship project of my company is built of 40-50 services which exposes about 500 endpoints in total. Even my side project is built of 6 services (20 APIs). I am using third party APIs and open APIs. During this everyday job, I noticed how to improve the most popular JSON APIs.
Believe me or not, services love to talk to each other. They do this all the time and that’s good. My customers are able to see the data, manipulate it, and delete it. Background jobs are generating reports, documents, or whatever they want. The problem starts, when the communication slows the services down and they are not able to do their job properly.
The problem
Some time ago developers in my company were asked to limit calls performed against on-premise microservices, as surprisingly the problem was the local internet bandwidth throughput. Local servers were connected by 100 Mb/s network and were unable to handle heavy traffic.
A few days later I heard a conversation between my colleague and his Product Owner. The PO asked If there is any quick-win on how to improve the response time of his service. The colleague started to explain what is the root cause of the problem: His service was fetching data from API A, then B, C, D, and E. The final response time was strongly dependent on connected services.
Then the colleague, who is a great professionalist started to enumerate possible solutions: cache part of the data, go in the direction of CQRS and Event Sourcing - start pre-generating view models as soon as the data changes. His answers were right, but caching in live-APIs is sometimes impossible. Implementation of Event Sourcing is very, very expensive both in terms of implementation as well as changing developers’ approach in the existing environment.
There is also an additional reason as to why my company wants to reduce communication time and reduce storage costs. We are slowly moving to the Internet of Things and Big Data technologies. And, in fact, Big Data workshops was the place where I learned about the Avro format.
I thought about those problems and I found one, simple solution fulfilling 3 main assumptions,
- Reduce the network traffic
- Decrease the communication time between microservices
- Minimal implementation cost - use REST API
But, first things first. I will start with a few words about why we are all currently using JSON APIs.
JSON - as the standard
JSON format implements a number of key features, that I could not imagine using REST API without. The most important are: the clear and easily readable format and consistent data model. Also, it is worth mentioning the number of tools you can use to parse, read or edit JSONs and even generate them automatically from C# models.
In fact, JSON has only one main disadvantage that comes to my mind - every response and request is sent as plain text. Usually, it is not a big deal, but in the case under consideration lack of default compression mechanism was the factor that brought the topic to the table.
Avro - as the evolution
Let me now briefly introduce the Avro format. For the detailed description please follow links to the Apache Wiki at the end of the article.
Avro file is build of 2 main parts,
The header contains information about the used codec (compression algorithm) and the readable schema of the data.
The data itself is compressed to the binary representation. Take a look at the illustrative example.
It doesn’t look really impressive here. But imagine a very, very long JSON. The size of the file would increase linearly with the number of records. While for Avro header and schema stays the same - what increases is the amount of encoded and well-compressed data.
Avro format inherits the readability of JSON. Note the schema representation - it could be easily read and extracted from the content. In real-life cases, this is very helpful e.g. during integration tests. I can call an API, and read just the schema of the data model - to prepare my models for deserialization.
Take a look at the data - you are not able to read it at first glance. And that is also a benefit. API responses could be easily hijacked by network tools. You can even peek at the responses in internet browsers. And from time to time it happens, that someone spots the data that should not be read by an unauthorized person. Keeping the data encoded increases the security of the solution. Reading Avro is not a big problem for a motivated person, but reduces the probability of accidental data leaks.
The benchmark results
Moving to raw numbers. The table below shows the results of BenchmarkDotNet performing requests against JSON and Avro APIs sending the same response but configured with different response serializers.
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1139 (1909/November2018Update/19H2) Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=3.1.402 [Host] : .NET Core 3.1.8 (CoreCLR 4.700.20.41105, CoreFX 4.700.20.41903), X64 RyuJIT [AttachedDebugger] DefaultJob : .NET Core 3.1.8 (CoreCLR 4.700.20.41105, CoreFX 4.700.20.41903), X64 RyuJIT
Serializer
|
Request Time
|
Serializer Time*
|
Allocated Memory
|
Compressed Size
|
Json
|
672.3 ms
|
173.8 ms
|
52.23 MB
|
6044 kB
|
Avro
|
384.7 ms
|
159.2 ms
|
76.58 MB
|
2623 kB
|
JsonGzip
|
264.1 ms
|
232.6 ms
|
88.32 MB
|
514 kB
|
JsonBrotli
|
222.5 ms
|
210.5 ms
|
86.15 MB
|
31 kB
|
AvroBrotli
|
193.5 ms
|
184.7 ms
|
74.75 MB
|
31 kB
|
AvroGzip
|
181.2 ms
|
168.5 ms
|
75.05 MB
|
104 kB
|
*Time needed only for serialization and deserialization
Serializers used in this experiment are Newtonsoft.Json and AvroConvert, a library that I have created for the purpose of handling serialization and deserialization C# objects to Avro format: AvroConvert: https://github.com/AdrianStrugala/AvroConvert. I have strongly focused on the dev workflow and usability of the package. Its interface should be clear and familiar for every user.
In the benchmark, standard JSON API provides the slowest response, and the returned object is of the biggest size. Just switching the serialization library to AvroConvert reduced response size by 56.6% and time by 42.8%(!!!). After setting up the proper codec for this case the final result was: 3 times the request time reduction and result object 60 times smaller than initial.
If we are working in modern infrastructure, and the network is not a very big deal, we can take a look only at serializer library time. Avro library performs about 15 ms faster than JSON. Maybe not much, but scale matters. When an API response 100 times per day it is 1.5 s gain, if 1000 times, 15 s. If this API responds every day: the gain is 1.5 hours per year.
For comparison purposes, I have included also JSON API with data compressed with GZip and Brotli compressions. The results are quite nice, but deprive all of the advantages of JSON format - the whole response is compressed and not readable anymore and the implementation is a little bit more complex than it was before. The real problem starts when my microservice calls a few others and they are returning responses in different compression formats. I would have to check manually the formats and write classes handling each of them. Not very convenient nor automated.
Using Avro, this problem disappears. Avro deserializer discovers used codec by itself and automatically deserializes the data. No difference if other API responses with gzipped, deflated, or raw Avro result.
What is more, one of the greatest features of Avro format is the possibility to choose codec type (compression algorithm in fact) used for serialization of the data. When the size is the key factor, for example when the case is to store the amount of data, the codec can be changed just by selecting a different enum value during serialization. In the example above, Brotli would be a clear winner. Enabling it decreases object size to only 31 kB - this means: 200 times less than JSON.
To sum up - in the given scenario Avro API fulfills every of the given assumptions,
- Network traffic reduced by 98%
- Communication time between microservices decreased by 73%
- Implementation is simple - look at the section below
How to build Avro API
And finally - let’s code this! The implementation in .NET Core 3.1 is very easy, in fact, we need just 3 classes,
- AvroInputFormatter
- AvroOutputFormatter
- HttpClient extensions that support Avro format
In my implementation serialization is done by AvroConvert. Every implementation of Avro serialization should be compatible with each other, though.
AvroInputFormatter
- public class AvroInputFormatter: InputFormatter {
- public AvroInputFormatter() {
- this.SupportedMediaTypes.Clear();
- this.SupportedMediaTypes.Add(MediaTypeHeaderValue.Parse("application/avro"));
- }
- public override async Task < InputFormatterResult > ReadRequestBodyAsync(InputFormatterContext context) {
- await using MemoryStream ms = new MemoryStream();
- await context.HttpContext.Request.Body.CopyToAsync(ms);
- var type = context.ModelType;
- object result = AvroConvert.Deserialize(ms.ToArray(), type);
- return await InputFormatterResult.SuccessAsync(result);
- }
- }
AvroOutputFormatter
- public class AvroOutputFormatter: OutputFormatter {
- private readonly CodecType _codec;
- public AvroOutputFormatter(CodecType codec = CodecType.Null) {
- _codec = codec;
- this.SupportedMediaTypes.Clear();
- this.SupportedMediaTypes.Add(MediaTypeHeaderValue.Parse("application/avro"));
- }
- public override async Task WriteResponseBodyAsync(OutputFormatterWriteContext context) {
- var avroBody = AvroConvert.Serialize(context.Object, _codec);
- var response = context.HttpContext.Response;
- response.ContentLength = avroBody.Length;
- await response.Body.WriteAsync(avroBody);
- }
- }
HttpClient extensions
- public static class HttpClientAvroExtensions {
- public static async Task < HttpResponseMessage > PostAsAvro(this HttpClient httpClient, string requestUri, object content) {
- var body = new ByteArrayContent(AvroConvert.Serialize(content));
- body.Headers.ContentType = new MediaTypeHeaderValue("application/avro");
- return await httpClient.PostAsync(requestUri, body);
- }
- public static async Task < T > GetAsAvro < T > (this HttpClient httpClient, string requestUri) {
- var response = await httpClient.GetByteArrayAsync(requestUri);
- T result = AvroConvert.Deserialize < T > (response);
- return result;
- }
- }
Modify Startup
- services.AddMvc(options => {
- options.InputFormatters.Insert(0, new AvroInputFormatter());
- options.OutputFormatters.Insert(0, new AvroOutputFormatter());
- });
And - that’s it. You’ve just sped up the responses of your APIs by at least 30%. Play with the serialization options, you can achieve even better results. I’ve gathered the methods used for communication in the separate library: https://www.nuget.org/packages/SolTechnology.Avro.Http/.
Thank you for reading the article. I hope you can make good use of the knowledge I just shared. In case of any questions, contact me at
[email protected]. Have a nice day!
Useful links
- http://avro.apache.org/
- https://cwiki.apache.org/confluence/display/AVRO/Index
- https://github.com/AdrianStrugala/AvroConvert