# apache/kafka

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/apache-kafka).**

32,011 stars · 14,967 forks · Java · apache-2.0

## Links

- GitHub: https://github.com/apache/kafka
- awesome-repositories: https://awesome-repositories.com/repository/apache-kafka.md

## Topics

`kafka` `scala`

## Description

Kafka is a distributed event streaming platform designed for capturing, storing, and processing real-time data streams across interconnected nodes. It functions as a distributed commit log, providing a fault-tolerant storage mechanism that records state changes sequentially to ensure data consistency and durability across distributed environments.

The platform distinguishes itself through a partitioned commit log architecture that enables horizontal scaling and parallel processing of data streams. It integrates a stream processing engine for continuous transformations and aggregations, while utilizing log-structured, append-only storage to maintain high-throughput sequential disk operations. Independent consumer groups manage their own read positions, and an asynchronous replication protocol ensures high availability by allowing follower nodes to pull data without blocking primary write paths.

Beyond core streaming, the system supports event-driven microservices, log aggregation, and archiving. It employs zero-copy network transfers to minimize overhead and provides a pluggable storage engine interface to accommodate various hardware configurations. Comprehensive documentation and API references are available to support integration and system management.

## Tags

### Data & Databases

- [Distributed Event Streaming Platforms](https://awesome-repositories.com/f/data-databases/distributed-event-streaming-platforms.md) — Captures, stores, and processes real-time data streams across multiple interconnected nodes and clusters.
- [Distributed Commit Logs](https://awesome-repositories.com/f/data-databases/distributed-commit-logs.md) — Ensures data consistency across multiple nodes by maintaining a reliable and ordered sequence of state changes.
- [Data Streaming Platforms](https://awesome-repositories.com/f/data-databases/data-streaming-platforms.md) — Facilitates high-throughput movement of real-time event data between distributed systems.
- [Partitioned Commit Logs](https://awesome-repositories.com/f/data-databases/partitioned-commit-logs.md) — Enables horizontal scaling and parallel processing by splitting topics into multiple segments across a cluster.
- [Stream Processing Engines](https://awesome-repositories.com/f/data-databases/stream-processing-engines.md) — Analyzes and transforms continuous data feeds on the fly as they arrive.
- [Append-Only Storage Engines](https://awesome-repositories.com/f/data-databases/append-only-storage-engines.md) — Persists data as an immutable sequence of records to allow for high-throughput sequential disk operations.
- [Message Brokers](https://awesome-repositories.com/f/data-databases/message-brokers.md) — Organizes incoming data as an ordered, immutable sequence of events for reliable consumption.
- [Consumer Offset Trackers](https://awesome-repositories.com/f/data-databases/consumer-offset-trackers.md) — Enables multiple independent consumers to process the same data stream at their own pace via stateful tracking.
- [Replication Protocols](https://awesome-repositories.com/f/data-databases/replication-protocols.md) — Maintains high availability by allowing follower nodes to pull data from the leader in the background.

### Software Engineering & Architecture

- [Event-Driven Architectures](https://awesome-repositories.com/f/software-engineering-architecture/event-driven-architectures.md) — Decouples service communication by using an asynchronous message bus to trigger actions across components.

### Networking & Communication

- [Zero-Copy Networking](https://awesome-repositories.com/f/networking-communication/zero-copy-networking.md) — Minimizes CPU overhead by moving data directly from disk cache to the network interface.
