4 Repos
Systems that defer or reorder data processing operations to improve performance on large datasets.
Distinguishing note: Focuses on the execution strategy of data queries rather than the storage or retrieval mechanism itself.
Explore 4 awesome GitHub repositories matching data & databases · Query Optimization Engines. Refine with filters or upvote what's useful.
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
Transforms complex analytical operations into optimized, multi-threaded execution plans to minimize resource usage during data processing.
Nushell is a cross-platform shell and programming language designed to treat all input and output as structured data rather than raw text streams. By enforcing data types and command signatures, it provides a consistent environment for building robust, pipeline-oriented workflows. The shell allows users to chain commands that pass structured objects between stages, enabling complex data processing and automation tasks that remain predictable across different operating systems. What distinguishes the project is its focus on interactive data exploration and modular extensibility. Users can quer
Defers computation of complex data transformations until the final result is requested to optimize performance for large datasets.
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
RethinkDB executes queries on the server with automatic parallelization across CPU cores and cluster nodes, while using lazy evaluation to minimize resource consumption.
PostgreSQL is an object-relational database management system designed for the persistent storage and retrieval of structured information. It functions as an ACID-compliant database server, utilizing standard query language protocols to maintain data consistency and reliability across large-scale application datasets. The system distinguishes itself through an extensible architecture that allows for the definition of custom data types, operators, and indexing methods. It employs multi-version concurrency control to enable simultaneous read and write operations without blocking, supported by a
Evaluates multiple execution plans using statistical data to determine the most efficient path for retrieving information.