30 open-source projects similar to redpanda-data/redpanda, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Redpanda alternative.
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
RocketMQ is a distributed messaging and streaming platform designed for building event-driven applications. It serves as middleware to decouple services using publish-subscribe and request-reply patterns, and functions as a transactional messaging system that ensures atomicity by linking message delivery to local transaction outcomes. The platform includes specialized capabilities as a Kubernetes-native message broker for container orchestration environments and an MQTT broker for ingesting event data from mobile applications and hardware terminals. The system covers high-throughput data str
Kafka is a distributed event streaming platform designed for capturing, storing, and processing real-time data streams across interconnected nodes. It functions as a distributed commit log, providing a fault-tolerant storage mechanism that records state changes sequentially to ensure data consistency and durability across distributed environments. The platform distinguishes itself through a partitioned commit log architecture that enables horizontal scaling and parallel processing of data streams. It integrates a stream processing engine for continuous transformations and aggregations, while
Prefect is a workflow orchestration platform designed to define, schedule, and monitor complex data pipelines as Python code. It functions as a container-native engine that wraps individual tasks in isolated environments, ensuring consistent dependencies and resource allocation across diverse infrastructure. By utilizing a state-machine-based orchestration model, the system tracks execution progress through discrete transitions and persistent event logs to maintain reliable and observable task processing. The platform distinguishes itself through a decoupled worker-API architecture, which sep
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
Fluvio is a distributed event streaming platform and cloud-native streaming engine designed for collecting, persisting, and replicating real-time data streams across a distributed cluster. It functions as a real-time data pipeline for building stateful workflows that ingest, enrich, and export data between external sources and sinks. The platform is distinguished by its use of WebAssembly to execute compiled modules for in-line data transformations and filtering. This allows for the execution of custom business logic to reshape information in motion without requiring a restart of the cluster.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Azure Docs is the official technical documentation repository for Microsoft Azure, the cloud computing platform. It provides comprehensive guidance on the full spectrum of Azure services, covering everything from core infrastructure components like virtual machines, Kubernetes clusters, and serverless computing to platform services for AI, machine learning, data analytics, and storage. The documentation details how to provision, manage, and govern cloud resources at scale, including policy enforcement, identity management, and cost optimization. The documentation distinguishes Azure through i
Memgraph is an in-memory, distributed graph database designed for high-performance labeled property graph management. It utilizes a Cypher query engine for declarative data retrieval and manipulation, providing a scalable knowledge graph backend that integrates vector search and graph traversals. The system distinguishes itself as a real-time graph analytics platform, employing native C++ and CUDA implementations to execute complex network analysis and dynamic community detection on streaming data. It provides specialized support for AI integration, including GraphRAG capabilities, the constr
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
Dagster is a data orchestration platform designed to manage the entire lifecycle of data assets through declarative modeling and version-controlled code. It functions as a workflow engine that treats data assets as first-class primitives, allowing teams to define, schedule, and monitor complex pipelines while maintaining clear visibility into lineage, dependencies, and data quality. The platform distinguishes itself by using a code-as-configuration framework that enables standard software engineering practices, such as unit testing and local mocking, to be applied directly to data workflows.
h2o is a high-performance content delivery server and HTTP/3 web server. It functions as a network gateway and reverse proxy that forwards client requests to upstream servers to manage traffic flow and load. The project distinguishes itself as a protocol fuzzing tool, utilizing a testing framework to execute automated stress tests against network protocols to identify memory leaks and crashes. The server provides capabilities for secure web traffic management through encrypted data transmission and high-performance web serving across HTTP/1, HTTP/2, and HTTP/3. It includes tools for server r
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Mio is a low-level I/O library for Rust that provides an event-driven framework for monitoring multiple network sockets and file descriptors. It acts as a portable wrapper for operating system native polling systems, including epoll, kqueue, and IOCP, allowing applications to trigger events when resources are ready for reading or writing without blocking the execution thread. The library provides a non-blocking socket interface for managing TCP, UDP, and Unix sockets. It distinguishes itself through a vectored I/O implementation, enabling scatter-gather reads and writes across multiple buffer
This project is a detailed analysis and study of the Nginx source code, focusing on high-performance server architecture and function call flows. It serves as a technical examination of the internal C implementation used to build high-concurrency networking systems. The project deconstructs the internal mechanisms of the web server, including the multi-process master-worker model, event-driven asynchronous I/O, and non-blocking socket communication. It analyzes the phase-based request processing lifecycle, from URI matching and header parsing to final content generation. The study covers a b
Horizon is a realtime API server and RethinkDB backend designed to push database changes instantly to front-end clients. It utilizes a WebSocket data streaming API to synchronize data between the database and user interfaces without requiring manual polling. The project integrates an OAuth identity manager for verifying user identities through third-party providers and a role-based access control system to define granular permissions for viewing or modifying database documents. It is delivered as a containerized backend framework, allowing the server and its dependencies to be deployed as a p
This project is a collection of educational resources and reference implementations for the Apache Flink stream processing framework. It provides a learning resource focused on mastering distributed stream processing through implementation guides, performance tuning tutorials, and practical examples. The repository features detailed walkthroughs for building real-time data pipelines using the DataStream and Table APIs. It includes specific integration examples for connecting Apache Flink with Kafka brokers and Elasticsearch indices, as well as reference implementations for real-time deduplica
Quantaxis is a quantitative trading framework designed for building, backtesting, and executing automated strategies across global equities, futures, and cryptocurrencies. It integrates an event-driven backtesting engine, a multi-market execution gateway for order routing, and a quantitative data pipeline for ingesting and storing multi-asset market data. The system features a Rust-accelerated financial library that utilizes Apache Arrow for high-performance technical indicator calculation and zero-copy data processing. It provides a containerized infrastructure model designed for orchestrati
Snowplow is a behavioral event data pipeline and customer data infrastructure designed to capture user interactions and transform them into structured events for real-time analysis and long-term storage. It functions as a customer data platform that gathers user signals and enriches them with metadata to create a unified view of customer behavior. The system operates as an event schema validation engine to enforce strict data contracts on incoming streams, preventing data corruption. It further serves as a real-time event router and an event-driven automation platform, triggering proactive bu
Garnet is a multi-threaded in-memory database and distributed key-value store. It functions as a high-performance remote cache store that implements the RESP wire protocol to maintain compatibility with existing Redis clients and libraries. The project is distinguished by a shared-memory architecture that enables parallel request processing across multiple cores for sub-millisecond latency. It features a tiered storage system that automatically offloads colder data from system memory to SSD or cloud storage layers, and includes a specialized vector search database for high-dimensional similar
Apache NiFi is a flow-based programming platform that enables the visual design, monitoring, and management of data pipelines. At its core, it provides a web-based visual dataflow designer where users build directed graphs of processors to route, transform, and mediate data movement between any source and destination without writing custom code. The system records fine-grained data provenance for every data item from ingestion to delivery, supporting audit, debugging, and replay of data lineage. The platform distinguishes itself through a zero-master cluster architecture that distributes proc
Orchest is a data pipeline orchestrator and containerized workflow manager. It provides a platform for designing, scheduling, and executing complex data processing sequences through a combination of a graphical interface and scripting. The platform distinguishes itself by using containers to manage software dependencies, ensuring consistent execution across different environments. It features a polyglot task scheduler capable of triggering jobs written in multiple programming languages and includes a version control system that tracks historical snapshots of project configurations and code.
TigerBeetle is a distributed financial accounting database designed for high-volume transaction processing. It functions as a specialized transaction engine that enforces strict double-entry bookkeeping invariants, ensuring that every debit and credit is balanced and accounted for with absolute consistency. By utilizing a consensus-based replication model, the system provides high availability and data durability across geographically distributed clusters, making it suitable for mission-critical financial infrastructure. The system distinguishes itself through a performance-oriented architect
Iggy is a distributed message streaming platform and multi-protocol message broker that functions as a persistent distributed log store. It provides infrastructure for publishing and consuming binary messages using an append-only log, ensuring high availability and data consistency across nodes through Viewstamped Replication. The platform is distinguished by its specialized LLM streaming infrastructure, which uses a server protocol to connect large language models to streaming data and system controls. This includes standardized protocols for context management and data bridging via HTTP or
Luigi is a Python framework designed for building and managing complex batch data pipelines. It functions as a workflow orchestration engine that organizes tasks into directed acyclic graphs, ensuring that jobs execute in the correct logical order based on their dependencies. By utilizing a centralized scheduler, the system coordinates task execution across distributed environments, tracks global workflow state, and prevents redundant processing by verifying the existence of output targets before triggering any work. The project distinguishes itself through a robust state-tracking mechanism t
Airbyte is a data integration platform designed to synchronize information between diverse applications, databases, and data warehouses. It functions as an extract, transform, and load orchestrator that manages automated data movement workflows across cloud, on-premise, and hybrid environments. The platform provides a standardized interface for connectors, enabling the movement of structured and unstructured data while maintaining stateful checkpoints for reliable incremental syncing. The platform distinguishes itself through a containerized architecture that isolates connectors to prevent de
Nomad is a distributed workload orchestrator and infrastructure automation platform designed to manage the lifecycle of applications across large-scale, heterogeneous environments. It functions as a multi-cloud orchestration engine, providing a unified control plane to deploy, scale, and govern containers, virtual machines, and legacy applications. By utilizing declarative job specifications, the system ensures infrastructure convergence and maintains the desired state across distributed data centers and geographic regions. The platform distinguishes itself through a flexible, plugin-based ar
Scylla is a distributed wide column NoSQL database designed as a high-performance data store. It functions as a Cassandra compatible database and a DynamoDB compatible store, implementing a shared-nothing architecture built on an asynchronous event-driven framework. The system emulates cloud-based APIs to support applications built for proprietary cloud protocols and implements the Cassandra Query Language for high-throughput workloads. This allows for the migration of cloud workloads to self-hosted environments while maintaining API compatibility. The project covers distributed data storage
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself