Data Loss and Data Duplication in Kafka

21 Sep 2016

2:05 pm - 2:30 pm

Data Loss and Data Duplication in Kafka

We all know that Kafka is a fast, scalable, durable and distributed publish-subscribe messaging system.

From large internet companies to startups are using it as part of their streaming data pipeline for its speed, durability and ease of use/administration. Kafka has a strong durability promise, yet, there are certain situations when there can be data loss or data duplication. While data duplication is easy to detect, data loss is not when you are dealing with hundreds of millions of messages.

This session will look into those scenarios and possible ways to detect and minimize data loss and data duplication.