4 Repos
Executes streaming pipelines as a directed acyclic graph of parallel subtasks, routing data across workers via forward or shuffle edges.
Distinct from Directed Acyclic Graph Execution Engines: Candidates are either ML-specific (Distributed Execution) or symbolic-execution-specific (Directed Acyclic Graph Execution Engines); no existing tag covers distributed execution of stream processing DAGs.
Explore 4 awesome GitHub repositories matching data & databases · Distributed Stream Execution. Refine with filters or upvote what's useful.
Apache Storm is a distributed stream processing framework and real-time data processing engine. It functions as a fault-tolerant distributed computing system designed to analyze data in motion across a cluster of machines for continuous stream computation. The system enables the creation of fault-tolerant data pipelines and scalable event processing by distributing workloads across a network of computing nodes. This architecture ensures low latency and high throughput for live data while allowing the system to recover automatically from individual node failures. The framework provides capabi
Executes streaming pipelines as a directed acyclic graph distributed across a cluster of worker nodes.
Octosql ist eine föderierte SQL-Query-Engine, ein Datentransformer und ein Streaming-SQL-Prozessor. Es ermöglicht die Ausführung einzelner SQL-Statements über mehrere heterogene Datenquellen hinweg – einschließlich verschiedener Datenbanktypen und Dateiformate –, um Ergebnisse zu einem einheitlichen Datensatz zusammenzuführen und zu transformieren. Das System zeichnet sich dadurch aus, dass es CSV-, JSONLines- und Parquet-Dateien als virtuelle Tabellen behandelt und eine Plugin-basierte Architektur nutzt, um die Konnektivität zu externen Speichersystemen zu erweitern. Es fungiert als Streaming-Prozessor für unendliche Datenströme und verwendet Watermarks, Retractions und Tumbling Windows, um die Konsistenz bei ungeordneten Ereignissen zu wahren. Zudem dient es als SQL-Datengenerator, der synthetische Datensätze und Record-Streams über tabellenwertige Funktionen erzeugen kann. Die Engine umfasst Funktionen für Cross-Source-Joins und Multi-Source-Analysen, die durch Source-Side Predicate Push-down optimiert werden, um den Datentransfer zu reduzieren. Sie verwaltet komplexe Daten über ein statisches Typsystem mit Union-Types und bietet Observability durch die Visualisierung von Query-Ausführungsplänen.
Executes queries on endless data streams using watermarks and retractions to handle out-of-order events.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Executes streaming pipelines as a distributed DAG of parallel subtasks for high throughput and fault tolerance.
vim-dadbod is a database interface for the Vim editor that allows for the execution of SQL and NoSQL queries. It functions as a connection manager and query runner, enabling users to interact with databases using connection URLs. The project acts as a bridge to native command-line interfaces, providing a wrapper to launch interactive database consoles. This integration allows users to run commands from the editor and view the results within a preview window. The system manages database connections through URL-based configurations and environment variables. It handles the execution of queries
Streams editor buffer contents to external database binaries for query execution.