4 repositorios
Executes streaming pipelines as a directed acyclic graph of parallel subtasks, routing data across workers via forward or shuffle edges.
Distinct from Directed Acyclic Graph Execution Engines: Candidates are either ML-specific (Distributed Execution) or symbolic-execution-specific (Directed Acyclic Graph Execution Engines); no existing tag covers distributed execution of stream processing DAGs.
Explore 4 awesome GitHub repositories matching data & databases · Distributed Stream Execution. Refine with filters or upvote what's useful.
Apache Storm is a distributed stream processing framework and real-time data processing engine. It functions as a fault-tolerant distributed computing system designed to analyze data in motion across a cluster of machines for continuous stream computation. The system enables the creation of fault-tolerant data pipelines and scalable event processing by distributing workloads across a network of computing nodes. This architecture ensures low latency and high throughput for live data while allowing the system to recover automatically from individual node failures. The framework provides capabi
Executes streaming pipelines as a directed acyclic graph distributed across a cluster of worker nodes.
Octosql es un motor de consultas SQL federado, transformador de datos y procesador de SQL en streaming. Permite a los usuarios ejecutar sentencias SQL únicas a través de múltiples fuentes de datos dispares, incluyendo diferentes tipos de bases de datos y formatos de archivo, para combinar y transformar resultados en un conjunto unificado. El sistema se distingue por tratar archivos CSV, JSONLines y Parquet como tablas virtuales y utilizar una arquitectura basada en plugins para extender la conectividad a motores de almacenamiento externos. Funciona como un procesador de streaming para flujos de datos infinitos, utilizando marcas de agua (watermarks), retracciones y ventanas deslizantes (tumbling windows) para mantener la consistencia en eventos fuera de orden. Además, sirve como generador de datos SQL capaz de producir conjuntos de datos sintéticos y flujos de registros mediante funciones con valores de tabla. El motor incluye capacidades para realizar joins entre fuentes de datos y análisis multi-fuente, optimizado mediante el push-down de predicados en el lado de la fuente para reducir la transferencia de datos. Gestiona datos complejos a través de un sistema de tipos estáticos con tipos unión y proporciona observabilidad mediante la visualización de planes de ejecución de consultas.
Executes queries on endless data streams using watermarks and retractions to handle out-of-order events.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Executes streaming pipelines as a distributed DAG of parallel subtasks for high throughput and fault tolerance.
vim-dadbod is a database interface for the Vim editor that allows for the execution of SQL and NoSQL queries. It functions as a connection manager and query runner, enabling users to interact with databases using connection URLs. The project acts as a bridge to native command-line interfaces, providing a wrapper to launch interactive database consoles. This integration allows users to run commands from the editor and view the results within a preview window. The system manages database connections through URL-based configurations and environment variables. It handles the execution of queries
Streams editor buffer contents to external database binaries for query execution.