4 个仓库
Executes streaming pipelines as a directed acyclic graph of parallel subtasks, routing data across workers via forward or shuffle edges.
Distinct from Directed Acyclic Graph Execution Engines: Candidates are either ML-specific (Distributed Execution) or symbolic-execution-specific (Directed Acyclic Graph Execution Engines); no existing tag covers distributed execution of stream processing DAGs.
Explore 4 awesome GitHub repositories matching data & databases · Distributed Stream Execution. Refine with filters or upvote what's useful.
Apache Storm is a distributed stream processing framework and real-time data processing engine. It functions as a fault-tolerant distributed computing system designed to analyze data in motion across a cluster of machines for continuous stream computation. The system enables the creation of fault-tolerant data pipelines and scalable event processing by distributing workloads across a network of computing nodes. This architecture ensures low latency and high throughput for live data while allowing the system to recover automatically from individual node failures. The framework provides capabi
Executes streaming pipelines as a directed acyclic graph distributed across a cluster of worker nodes.
Octosql 是一个联邦 SQL 查询引擎、数据转换器和流式 SQL 处理器。它允许用户跨多个异构数据源(包括不同类型的数据库和文件格式)执行单一 SQL 语句,从而合并并转换结果集。 该系统的独特之处在于将 CSV、JSONLines 和 Parquet 文件视为虚拟表,并利用基于插件的架构扩展对外部存储引擎的连接。它作为无限数据流的流式处理器,使用水印(watermarks)、撤回(retractions)和翻滚窗口(tumbling windows)来维持乱序事件的一致性。此外,它还可用作 SQL 数据生成器,通过表值函数生成合成数据集和记录流。 该引擎具备跨源数据连接和多源分析能力,并通过源端谓词下推(predicate push-down)进行优化,以减少数据传输。它通过包含联合类型的静态类型系统管理复杂数据,并提供查询执行计划可视化功能以增强可观测性。
Executes queries on endless data streams using watermarks and retractions to handle out-of-order events.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Executes streaming pipelines as a distributed DAG of parallel subtasks for high throughput and fault tolerance.
vim-dadbod is a database interface for the Vim editor that allows for the execution of SQL and NoSQL queries. It functions as a connection manager and query runner, enabling users to interact with databases using connection URLs. The project acts as a bridge to native command-line interfaces, providing a wrapper to launch interactive database consoles. This integration allows users to run commands from the editor and view the results within a preview window. The system manages database connections through URL-based configurations and environment variables. It handles the execution of queries
Streams editor buffer contents to external database binaries for query execution.