Fluvio

Fluvio is a distributed event streaming platform and cloud-native streaming engine designed for collecting, persisting, and replicating real-time data streams across a distributed cluster. It functions as a real-time data pipeline for building stateful workflows that ingest, enrich, and export data between external sources and sinks.

The platform is distinguished by its use of WebAssembly to execute compiled modules for in-line data transformations and filtering. This allows for the execution of custom business logic to reshape information in motion without requiring a restart of the cluster.

The system covers a broad range of capabilities including connector-based data ingestion from external protocols, log-structured immutable storage with zero-copy IO, and horizontal cluster scaling. It supports the creation of complex event-driven pipelines that utilize stateful processing, windowed aggregations, and partition-based data distribution.

The engine can be deployed as a lightweight binary on diverse system architectures, including ARM64 IoT devices for edge data processing.

Features

Streaming Data Processing - Provides a distributed engine for analyzing and transforming continuous streams of data in real time.
Event-Driven Data Pipelines - Builds event-driven pipelines that integrate streaming with stateful processing to transform data in motion.
Real-Time Data Streaming - Provides a high-performance platform for collecting, persisting, and replicating real-time event streams.
Stream Processing Runtimes - Executes custom business logic via WebAssembly modules to transform and filter data streams in motion without cluster restarts.
Partitioned Data Replicators - Distributes data partition copies across nodes with automated leader election for continuous availability.
Wasm Post-Processing - Uses WebAssembly modules to apply reusable processing functions and transformations to data streams.
Topic Management - Provides the ability to define and organize named storage entities for distributing real-time data streams.
Data Processing Pipelines - Supports the creation of hierarchical processing flows including stateful operations, data enrichment, and windowing.
Real-Time Data Processors - Implements a framework for building stateful workflows that ingest, enrich, and export data.
Multi-Source Data Integration - Integrates various external data sources directly into the streaming pipeline for ingestion.
Data Transformation Functions - Applies user-defined functions to manipulate or filter data as it moves through connectors.
Stream Transformation Logic - Allows the execution of custom business logic in various languages to transform data streams.
Distributed Event Streaming Platforms - Ships a distributed system for high-throughput collection, persistence, and replication of real-time data streams.
Data Partitioning - Distributes load and increases throughput by splitting individual topics into parallel data partitions.
External Data Connectors - Provides a pluggable architecture of inbound and outbound connectors to ingest data from external protocols into streams.
External Data Ingestion - Automatically pipes information from external services and endpoints into live streams via connectors.
Horizontal Scaling - Supports horizontal scaling by adding declarative processing units and storage nodes to increase cluster capacity.
Topic Offset Consumption - Enables reading a sequence of records from a stream starting at a specific offset.
Log-Structured Storage - Uses a log-structured storage engine to save immutable, append-only message segments for high-performance writes.
Connector-Based Ingestion - Uses connectors to automatically pull data from external protocols and services.
Data Stream Management - Facilitates the structural management of append-only time series streams used for events and metrics.
Event Streaming Databases - Implements a database that persists events as immutable streams and delivers them to subscribers in real time.
Programmable Stream Processing - Integrates WebAssembly to run high-performance custom processing logic for real-time stream transformations.
Stream Partitioning - Splits topics into parallel partitions to distribute traffic and increase total throughput.
Stream Record Producers - Enables sending individual data entries into a stream for downstream processing or storage.
Stream Transformations - Performs record-based operations and windowed aggregations to create materialized views from streaming data.
Wasm Transformations - Applies custom WebAssembly logic and filters to reshape data in motion.
Topic Data Distribution - Enables pushing data into a topic for distribution and reading it back for processing.
External Service Integrations - Polls data from external endpoints using connectors to automatically populate specific stream topics.
Cloud Native Infrastructure - Provides a horizontally scalable data architecture designed for cloud-native environments and edge devices.
Horizontal Scaling Deployments - Implements horizontally scaling deployments for distributed processing units and storage nodes.
Cluster Resource Management - Provides a dedicated control plane API to manage the lifecycle of topics, replicas, and processing units.
Message Stream Consumer Groups - Reads records from a topic starting from the most recent entry or the beginning using a scalable architecture.
Topic Producers - Provides writer abstractions for producing messages to topics with partition assignment and compression.
Stream Topic Publishing - Supports sending records to a stream with configurable batching and compression to optimize throughput.
In-Line Stream Processing - Injects compiled WebAssembly modules for high-performance in-line data transformations and payload detection.
Wasm Stream Processors - Executes compiled WebAssembly modules to transform and filter data in motion.
Storage Immutability - Implements architectural patterns for treating stored data as fixed once written using zero-copy IO.
Composable Workflows - Provides modular systems for chaining functions and operators into reusable, independently tested dataflow packages.
Stream State Accumulation - Implements stream state accumulation for window-based aggregates on real-time data streams.
Streaming Data Transformations - Executes custom functions to process or modify data in motion during production or consumption.
Stream Record Consumption - Provides capabilities to read records from a stream by starting from a specific offset or listening for arrivals.
Client Libraries - Provides native language bindings to integrate streaming functions directly into application logic.
Custom Connector Development - Provides a command line tool for generating, testing, and deploying custom input and output adapters.
Data Destination Connectors - Ships configuration interfaces for establishing connections to external target storage systems and databases.
Transformation Chains - Sequences multiple processing modules into a pipeline for multi-step data manipulations.
Edge Data Processing - Executes data processing and transformation binaries on ARM64 and other IoT devices.
Data Export Connectors - Pushes processed information to external databases, object storage, and search engines via outbound connectors.
Stream Deduplication - Removes redundant records from real-time data streams to ensure only unique entries are processed.
Disk IO Optimization - Implements zero-copy disk IO to persist immutable event records while minimizing memory copying overhead.
Server-Side Data Transformations - Executes programmable functions on the client or server side to modify, filter, or reshape data streams.
Sink Data Loading - Implements loading of processed data streams into various target storage systems and analytical databases.
State Management Stores - Provides centralized systems for managing and persisting application state using typed schemas.
Deployment Command Line Interfaces - Ships a terminal-based tool for provisioning workers to execute data processing logic.
Pipeline Orchestration - Manages the execution flow and data lifecycle of complex multi-stage analytics pipelines via declarative APIs.
Connector Deployment - Allows loading compiled data processing modules into a running streaming cluster for active use.
Resource Coordination - Implements a dedicated control plane to coordinate the lifecycle of topics and replicas independently from the data path.
Distributed Leader Election - Uses automated leader election to maintain high availability and coordinate writes across the distributed cluster.
Streaming Cluster Orchestration - Provides the ability to initialize and start a distributed cluster to manage stream collection and processing.
Edge Computing Runtimes - Can be deployed as a lightweight binary on diverse architectures including ARM64 IoT devices.
External Service Connectivity - Interfaces with ingress and egress services through a pluggable library of data connectors.
Dataflow Visualizers - Provides graphical representations of dataflow hierarchies and runtime analytics.
Cluster Health Monitoring - Provides interfaces for querying the health and status of stream partitions and replication.
Runtime State Inspection - Allows inspection of real-time metrics and internal state of active dataflows.
Data Pipelines - Programmable data streaming platform with in-line computation.

hazelcast/hazelcast

6,570View on GitHub

Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis

apache/storm

6,683View on GitHub

Storm is a distributed stream processing framework designed to execute unbounded computations across a cluster to process real-time data streams. It functions as a data pipeline orchestrator that allows users to define and deploy declarative data flow graphs connecting streaming sources to processing components. The system operates as a multi-tenant distributed compute engine that isolates workloads and limits resource usage across shared clusters using dedicated pools and access control. It is also a secure distributed processing engine that employs encrypted node communication and SSL-secur

zhisheng17/flink-learning

15,071View on GitHub

This project is a collection of educational resources and reference implementations for the Apache Flink stream processing framework. It provides a learning resource focused on mastering distributed stream processing through implementation guides, performance tuning tutorials, and practical examples. The repository features detailed walkthroughs for building real-time data pipelines using the DataStream and Table APIs. It includes specific integration examples for connecting Apache Flink with Kafka brokers and Elasticsearch indices, as well as reference implementations for real-time deduplica

apache/pinot

6,098View on GitHub

Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer

infinyonfluvio

Features

Open-source alternatives to Fluvio

hazelcast/hazelcast

apache/storm

zhisheng17/flink-learning

apache/pinot

Star history

Open-source alternatives to Fluvio

hazelcast/hazelcast

apache/storm

zhisheng17/flink-learning

apache/pinot