30 open-source projects similar to infinyon/fluvio, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Fluvio alternative.
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Storm is a distributed stream processing framework designed to execute unbounded computations across a cluster to process real-time data streams. It functions as a data pipeline orchestrator that allows users to define and deploy declarative data flow graphs connecting streaming sources to processing components. The system operates as a multi-tenant distributed compute engine that isolates workloads and limits resource usage across shared clusters using dedicated pools and access control. It is also a secure distributed processing engine that employs encrypted node communication and SSL-secur
This project is a collection of educational resources and reference implementations for the Apache Flink stream processing framework. It provides a learning resource focused on mastering distributed stream processing through implementation guides, performance tuning tutorials, and practical examples. The repository features detailed walkthroughs for building real-time data pipelines using the DataStream and Table APIs. It includes specific integration examples for connecting Apache Flink with Kafka brokers and Elasticsearch indices, as well as reference implementations for real-time deduplica
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
RocketMQ is a distributed messaging and streaming platform designed for building event-driven applications. It serves as middleware to decouple services using publish-subscribe and request-reply patterns, and functions as a transactional messaging system that ensures atomicity by linking message delivery to local transaction outcomes. The platform includes specialized capabilities as a Kubernetes-native message broker for container orchestration environments and an MQTT broker for ingesting event data from mobile applications and hardware terminals. The system covers high-throughput data str
GreptimeDB is a distributed, open-source time-series database built for unified observability. It stores and queries metrics, logs, and traces together in a single columnar engine, supporting both SQL and PromQL for analysis. The database is designed as a Kubernetes-native operator with a decoupled compute and storage architecture, enabling horizontal scaling and multi-region deployment. What distinguishes GreptimeDB is its role as a multi-protocol ingestion gateway, accepting data through OpenTelemetry, Prometheus Remote Write, InfluxDB, Loki, Elasticsearch, Kafka, and MQTT protocols without
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Apache Storm is a distributed stream processing framework and real-time data processing engine. It functions as a fault-tolerant distributed computing system designed to analyze data in motion across a cluster of machines for continuous stream computation. The system enables the creation of fault-tolerant data pipelines and scalable event processing by distributing workloads across a network of computing nodes. This architecture ensures low latency and high throughput for live data while allowing the system to recover automatically from individual node failures. The framework provides capabi
Redpanda is a distributed event streaming engine designed to serve as a high-performance, drop-in replacement for existing event-driven architectures. It provides a foundation for building and scaling applications that require reliable data movement, analytical querying, and strict operational compliance across both cloud and self-managed environments. The platform distinguishes itself through a shared-nothing architecture that utilizes thread-per-core execution and a non-blocking asynchronous input/output engine to maximize throughput. It maintains data consistency through a consensus-based
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Benthos is a stream processing engine and data integration pipeline used for routing, transforming, and connecting data streams between diverse sources and sinks. It functions as event routing middleware and a change data capture tool, streaming real-time database modifications as discrete events for downstream processing. The system utilizes a declarative pipeline configuration, where data flow and processing logic are defined in a single static file. It features a specialized domain-specific language for mapping, filtering, and enriching data payloads, allowing for complex transformations w
YDB is a distributed SQL database and analytical engine designed for horizontal scalability and strong consistency. It functions as a multi-model system that supports transactional and analytical workloads through a distributed architecture providing serializable ACID transactions. The system is distinguished by its broad protocol compatibility, implementing the PostgreSQL wire protocol for standard SQL drivers and the Kafka protocol for messaging and streaming. It further serves as a vector database, supporting vector indexes and approximate nearest neighbor searches for semantic search and
Perspective is a columnar data analytics library and streaming data visualization engine. It provides an interactive data grid component and notebook analytics widgets designed for processing high-volume data and rendering interactive charts and grids. The system utilizes a high-performance query engine to enable real-time data analysis and streaming dataset visualization. It supports the creation of customizable dashboards and reports that update automatically as new data arrives without requiring full dataset reloads. The project covers large-scale dataset analytics through a schema-driven
Faust is a Python library for building distributed stream processing applications that integrate with Kafka. It functions as an asynchronous stream processor designed to handle high-throughput event streams and real-time data analysis using asynchronous functions. The system operates as a distributed stream processor and state store, utilizing sharding and partitioned topics to scale processing workloads horizontally across multiple worker nodes. It maintains state through a replicated key-value storage system backed by local databases to ensure high availability and fast recovery. The frame
YouPlot is a command line plotting utility and terminal data visualization tool used to render statistical plots and charts directly within a terminal interface using Unicode characters. It functions as a Unix pipeline plotter, allowing users to visualize numerical data without leaving the shell. The project operates as a real-time data visualizer, drawing plots progressively as data streams into the system. It integrates into command line pipelines by reading data from standard input to provide real-time stream monitoring and data analysis. The tool covers a variety of rendering capabilitie
Orleans is a .NET distributed actor framework designed for building scalable, cloud-native applications. It implements a virtual actor model where entities with stable identities manage their own state and lifecycle across a cluster of servers. The framework provides a distributed state management system with ACID transaction support and a distributed pub/sub streaming engine for real-time data processing. It distinguishes itself through location-transparent routing, automatic actor activation and deactivation, and elastic cluster scaling that redistributes workloads during node failures. Th
Soketi is a high-performance, protocol-compatible WebSocket server designed for real-time, bidirectional communication. It functions as a multi-tenant gateway that manages persistent client connections and broadcasts events across private, presence, and public channels. By implementing a standardized messaging protocol, it allows existing client SDKs and broadcasting frameworks to integrate without requiring modifications to application logic. The project distinguishes itself through its focus on operational stability and multi-tenant isolation. It supports granular, per-application resource
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transformation workflows. The framework distinguishes itself through differential dataflow execution, which propagates only changes through a pipeline rather than recomputing entire datasets. It supports distributed state management across worker nodes and utilizes incremental stream p
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
Materialize is a streaming SQL database that continuously ingests live data from sources such as Kafka, Redpanda, PostgreSQL, and MySQL, and incrementally maintains materialized views. It provides a PostgreSQL-compatible query engine that accepts standard SQL over the PostgreSQL wire protocol, enabling any existing SQL client or BI tool to query real-time data. The system also includes a Model Context Protocol (MCP) server that exposes live materialized view data to AI agents, providing fresh context without polling. Materialize distinguishes itself through its ability to offer configurable c
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Cassandra is a distributed NoSQL database and wide-column store designed for high availability and linear scalability. It functions as a fault-tolerant distributed system that utilizes an LSM-tree storage engine to optimize write throughput and manage massive datasets. The system is a CQL-compliant database, using a structured query language to manage and retrieve tabular data stored across multiple nodes. It organizes information into rows and columns based on a flexible schema and primary keys. The project provides capabilities for horizontal database scaling, distributed data partitioning
Okio is a Java I/O library providing a set of tools for efficient byte-stream processing and file system operations. It functions as a buffered byte stream handler and streaming data transformer, utilizing a cross-platform file system API to manage data movement. The project is distinguished by its use of pooled mutable byte buffers that treat sequences as queues to reduce memory copying and garbage collection churn. It further decouples file operations from the host operating system through an abstraction-based file system, allowing for consistent path manipulation and atomic operations acro
This project provides educational materials and courseware focused on the theoretical and practical foundations of distributed systems design. It serves as a comprehensive curriculum covering the disciplines of consensus, data consistency, reliability engineering, and scalability. The instructional content focuses on achieving cluster agreement through consensus algorithms and managing system-wide state via coordination frameworks. It includes a dedicated guide to data theory, exploring replication strategies, consistency models, and data convergence. The courseware covers a broad capability
Feast is a machine learning feature store and MLOps data infrastructure layer. It provides a centralized system for managing and serving features across offline training and online production environments, utilizing an online feature serving layer for low-latency retrieval. The project centers on a feature registry that acts as a central catalog for defining, governing, and discovering feature services. It employs a unified data access layer to decouple feature retrieval from physical storage and includes a point-in-time data generator to create historically accurate training datasets that pr
Fluent Bit is a cloud-native log shipper and unified telemetry collector designed as a resource-efficient data pipeline. It ingests logs, metrics, and traces from multiple sources, processing them in real-time before routing the data to external storage backends. The project functions as a real-time stream processor and OpenTelemetry log processor, capable of transforming and filtering data using SQL and conditional logic. It also acts as a distributed tracing agent that can sample traces to reduce data volume while preserving full request paths. The system provides reliable data delivery th
This project is a command-line processor designed for the parsing, filtering, and transformation of structured data streams. It functions as a declarative programming environment that treats data as immutable streams, allowing users to perform complex structural modifications through the composition of small, reusable functions. By utilizing a recursive tree traversal engine, the system enables the navigation, inspection, and modification of deeply nested hierarchical data structures. The engine distinguishes itself through a stream-oriented architecture that processes input records one by on
Velociraptor is a digital forensics and incident response platform, endpoint detection and response system, and visibility tool. It provides a query engine and remote forensic collector used to hunt for indicators of compromise and perform triage across a fleet of hosts. The system is distinguished by its specialized query language for interrogating host state and parsing binary files. It features a notebook environment that combines markdown documentation with executable query cells to standardize investigative workflows and enable collaborative reporting. The platform covers a wide range o