4 रिपॉजिटरी
Executes streaming pipelines as a directed acyclic graph of parallel subtasks, routing data across workers via forward or shuffle edges.
Distinct from Directed Acyclic Graph Execution Engines: Candidates are either ML-specific (Distributed Execution) or symbolic-execution-specific (Directed Acyclic Graph Execution Engines); no existing tag covers distributed execution of stream processing DAGs.
Explore 4 awesome GitHub repositories matching data & databases · Distributed Stream Execution. Refine with filters or upvote what's useful.
Apache Storm is a distributed stream processing framework and real-time data processing engine. It functions as a fault-tolerant distributed computing system designed to analyze data in motion across a cluster of machines for continuous stream computation. The system enables the creation of fault-tolerant data pipelines and scalable event processing by distributing workloads across a network of computing nodes. This architecture ensures low latency and high throughput for live data while allowing the system to recover automatically from individual node failures. The framework provides capabi
Executes streaming pipelines as a directed acyclic graph distributed across a cluster of worker nodes.
Octosql is a federated SQL query engine, data transformer, and streaming SQL processor. It allows users to execute single SQL statements across multiple disparate data sources, including different database types and file formats, to merge and transform results into a unified set. The system distinguishes itself by treating CSV, JSONLines, and Parquet files as virtual tables and utilizing a plugin-based architecture to extend connectivity to external storage engines. It functions as a streaming processor for infinite data streams, using watermarks, retractions, and tumbling windows to maintain
Executes queries on endless data streams using watermarks and retractions to handle out-of-order events.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Executes streaming pipelines as a distributed DAG of parallel subtasks for high throughput and fault tolerance.
vim-dadbod is a database interface for the Vim editor that allows for the execution of SQL and NoSQL queries. It functions as a connection manager and query runner, enabling users to interact with databases using connection URLs. The project acts as a bridge to native command-line interfaces, providing a wrapper to launch interactive database consoles. This integration allows users to run commands from the editor and view the results within a preview window. The system manages database connections through URL-based configurations and environment variables. It handles the execution of queries
Streams editor buffer contents to external database binaries for query execution.