30 open-source projects similar to lsds/saber, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Saber alternative.
Apache DataFusion Ballista Distributed Query Engine
Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transformations. The system is distinguished by its ability to manage persistent operator state to ensure exactly-once processing guarantees and consistency during failures. It features specialized capabilities for complex event processing to detect temporal patterns and handles out-of-order events using eve
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Storm is a distributed stream processing framework designed to execute unbounded computations across a cluster to process real-time data streams. It functions as a data pipeline orchestrator that allows users to define and deploy declarative data flow graphs connecting streaming sources to processing components. The system operates as a multi-tenant distributed compute engine that isolates workloads and limits resource usage across shared clusters using dedicated pools and access control. It is also a secure distributed processing engine that employs encrypted node communication and SSL-secur
High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and intelligent analysis.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Real-time decision features without streaming infra. Turn live events into product reflexes — no Kafka, no Flink, no feature store.
Cocoindex is an incremental data processing engine that builds and maintains live indexes for AI agents, with a core focus on codebase indexing and knowledge graph extraction. The engine uses a function-graph execution model where user-defined Python functions are composed into a directed acyclic graph, and it processes data incrementally so only changed source records or code paths are re-computed, avoiding full recomputation at any scale. It supports automatic schema inference from transformation pipeline type annotations and provides full data lineage tracing, tagging every output record wi
The core libraries of the teknek stream processing platform
Haskell distributed stream processing with exactly-once semantics
Distributed Stream and Batch Processing
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
Trill is a single-node query processor for temporal or streaming data.
Data management for the sensor-edge-cloud continuum
A platform that makes it easy for developers to build realtime, cost-effective, operations-focused applications
Kubernetes-native platform to run massively parallel data/streaming jobs
Distributed, masterless, high performance, fault tolerant data processing
Spooker is a dynamic framework for processing high volume data streams via processing pipelines
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen