3 repositorios
Execution engines that process query plans through a hierarchy of stages, tasks, and operators.
Distinct from Distributed Query Processing: Distinct from general stream processing: focuses on the execution of query plans through distributed stages and operators.
Explore 3 awesome GitHub repositories matching data & databases · Distributed Query Stream Processors. Refine with filters or upvote what's useful.
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Executes query plans through a hierarchy of stages, tasks, and operators that transform and exchange data across the cluster.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Streams data blocks between query stages to enable low-latency distributed execution.
YDB es una base de datos SQL distribuida y motor analítico diseñado para la escalabilidad horizontal y una fuerte consistencia. Funciona como un sistema multimodelo que admite cargas de trabajo transaccionales y analíticas a través de una arquitectura distribuida que proporciona transacciones ACID serializables. El sistema se distingue por su amplia compatibilidad de protocolos, implementando el protocolo de cable de PostgreSQL para controladores SQL estándar y el protocolo de Kafka para mensajería y streaming. Además, sirve como una base de datos vectorial, admitiendo índices vectoriales y búsquedas de vecinos más cercanos aproximados para búsqueda semántica e incrustaciones. La plataforma gestiona datos utilizando un modelo de almacenamiento híbrido con formatos orientados a filas y columnas, utilizando ejecución de consultas vectorizadas para analíticas a escala de petabytes. Su superficie operativa incluye streaming de captura de datos de cambio, colas persistentes de entrega única y alta disponibilidad multizona. El despliegue y la gestión del ciclo de vida son compatibles a través de un operador de Kubernetes y aprovisionamiento de infraestructura como código.
Runs streaming queries that automatically restart on failure and use checkpoints to persist state.