Getting Started With Apache Kafka: Introductory Guide

Introduction

In this article, I'll give a detailed introduction to Kafka and what it means for developers to use Apache Kafka as a streaming platform. Before we get into the details of Kafka, I would like to take a step back and highlight the current state of Software development.

Software development

In the past, all the applications that were built were a giant monolith. That means all the functionalities will reside in one single application.

The example here is a retail application, and some of the related services are order service, payment service, inventory service, and notification service. All those services will reside in one single application and share the same database. This kind of architecture was proven to fail under heavy load. Things have changed, and the current state of development uses a more modern architecture, which is the microservices architecture.

Development

As we can see in the above image, the application in itself is decomposed into microservices, and each microservice has its own database. However, as a whole, in order to deliver business functionality or value, multiple microservices interact with each other using some communication protocols.

The expectation of apps that are built today has a new requirement, which is providing real-time notifications and processing the events as they occur(eg: Live tracking of food delivery person on Swiggy or Zomato platform). In order to support that, we need to have a middleware in between.

Middelware

A full-blown architecture will look like this. This is a microservice architecture. The middleware that we are using here is an event streaming platform. Basically, at its core, each microservice will have an API which is going to generate a lot of events and other services. In a nutshell, each microservice will have an API and it will have an event producer and even consumer. All other services communicate with each other through an event streaming platform. This fundamentally forms the basis for event-driven microservices.

What is an event Streaming Platform?

An event streaming platform allows the application to produce and consume a stream of records, as in a messaging system. It is similar to the pub sub-model.

The producer and consumer here are independent of each other, meaning the producer has no clue about which consumer is going to read this message.

Streaming Platform

  1. A streaming platform also stores the stream of events so that it can be replayed if it's necessary.
  2. Events are generally retained in multiple servers to provide fault tolerance and availability.
  3. A streaming platform also allows the application to process the records as they occur.

Basically, these three principles form the foundation for the event streaming platform, and Apache Kafka is built on top of these principles. So here, the event streaming platform that we have can be replaced with Apache Kafka.

Apache Kafka

Is Kafka an enterprise messaging system?

Let me quickly tell you the difference between the traditional messaging system and Kafka.

Messaging system

  1. Traditional messaging systems have transient message persistence, meaning once the records are read by the consumers, then the messages will be removed from the message broker. In the case of Kafka, it is going to save the event in the file system where Kafka is installed, and the events are retained for a certain time. All the events in Kafka are immutable, meaning once the records are sent to Kafka, they cannot be altered.
  2. In a traditional messaging system, it is the broker's responsibility to keep track of messages consumed by the consumers and remove them from the broker when the messages are read. But in Kafka, it's the responsibility of the consumer to keep track of consumed messages.
  3. With a traditional messaging system, we can target a specific consumer to read the message from the broker. That's not the case in Kafka. Any consumer who has access to the broker can read the message.
  4. Traditional messaging systems do not follow the principles of distributed systems. However, Kafka is built on top of the core principles of a distributed system.

Note. A distributed system, in general, is proven to handle the load very well and has the intelligence to distribute the load.

UseCases of Apache Kafka

Kafka

  1. Transportation Domain: Kafka can be used for many different use cases in the transportation domain.
    1. Booking rides online through the App is pretty common today.
    2. Kafka can be used for sending real-time tracking of driver notifications to the rider.
    3. Ordering food online is also a pretty common scenario today, and Kafka can be used to provide real-time tracking of the driver delivering the food.
  2. Retail Domain
    1. In the retail world, Kafka can be used to provide real-time sale notifications.
    2. Real-time purchase recommendations based on previous purchases and real-time tracking of online orders.
  3. Banking Domain
    1. In the banking world, Kafka can be used to alert on real-time fraudulent transactions.
    2. Provide new features, product notifications, and more.

Conclusion

This makes it to the end of the article, where we learned some concepts, which are mentioned below.

  1. What is Apache Kafka?
  2. How Kafka is built on top of Event Streaming Platform.
  3. How is it different from an Enterprise Messaging System?
  4. Use cases of Kafka in different domains.

Disclaimer on photos used: All photos used in the above article are either taken from Udemy or Google. Copyright is authorized to respective owners.