1 repo
Methods for executing and parallelizing data queries across multiple nodes in a distributed environment.
Distinguishing note: Focuses on the execution plan and streaming of data from distributed sources.
Explore 1 awesome GitHub repository matching data & databases · Distributed Query Processing. Refine with filters or upvote what's useful.
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
RethinkDB executes queries by transforming them into parallelized, lazy-evaluated execution plans that stream data chunks from multiple servers to the client for efficient processing.