This article explains how Exactly-Once Processing in Kafka works internally. It assumes that the reader is already familiar with the basics of Kafka and its ecosystem.

For a quick recap of Kafka, the reader can refer to my previous article Link.

Message Delivery Guarantees

Kafka supports three types of Message Delivery Guarantees.

  1. At-most once: Every message is persisted in Kafka at-most-once. Message loss is possible if the producer doesn’t retry on failures.
  2. At-least-once: Every message is guaranteed to be persisted in Kafka at-least-once. …

This article describes the internal working of Apache Zookeeper. It starts with the explanation of various components in Zookeeper and deep dives into its architecture.

Source: Google

Introduction

Apache Zookeeper is a distributed coordination service that is used by applications to implement various distributed primitives like leader election, configuration management, membership management, etc…

In an application involving multiple components, the components of the system need to work together and coordinate to achieve a result.

Example in a master-worker architecture

  1. The master needs to identify its workers which are processing tasks and which are idle to execute more tasks
  2. The master needs…

This article gives a glimpse of internals of some of the popular storage structures used in databases and distributed systems.

The following storage structures will be covered in this article: Bloom Filter, LSM Trees, B+ Trees, Inverted Index, Merkel Trees, Consistent Hashing, Skip lists, HyperLogLog, Count Min Sketch.

Bloom Filter

Bloom Filter is a probabilistic data structure that is used to determine the membership of an element in a set of elements.

It is probabilistic because it can produce false-positive matches (The data structure can return the result saying an element is possibly present in the set which may not be 100%…

Introduction

This article gives a glimpse of what exactly happens when a message is produced to Kafka, followed by how it is stored in Kafka and finally how it is consumed by a consumer.

Before that, let’s go through some basic constructs and terminologies used in Kafka.

Apache Kafka is a distributed pub/sub messaging system where producers can publish messages to Kafka and consumers can subscribe to certain classes of messages and consume them. It is often regarded as a distributed commit log since messages published to Kafka are stored reliably in order until the retention period.

Message: Message is a…

sudan

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store