2 repositorios
Frameworks that execute data transformation and routing pipelines across a distributed cluster of workers.
Distinct from Plugin-Based ETL Frameworks: Focuses on the distributed execution engine for stream processing rather than just the plugin architecture for connectors.
Explore 2 awesome GitHub repositories matching data & databases · Distributed Stream Processors. Refine with filters or upvote what's useful.
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
Implements a distributed streaming ETL framework for filtering, transforming, and routing data in flight.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Connects distributed processing frameworks to the datastore to enable reading and writing data within complex streaming pipelines.