Why is apache/spark a recommended Query Optimizers GitHub Repositories repository?

Implements a cost-based and rule-based optimizer to transform SQL expressions into efficient physical execution plans.

Why is pola-rs/polars a recommended Query Optimizers GitHub Repositories repository?

Optimizes query execution by filtering rows and selecting columns as close to the source as possible.

Why is duckdb/duckdb a recommended Query Optimizers GitHub Repositories repository?

Dynamically selects efficient execution plans for analytical workloads at runtime.

Why is prestodb/presto a recommended Query Optimizers GitHub Repositories repository?

Provides cost-based query optimization to rewrite execution paths based on table statistics and historical data.

Why is trinodb/trino a recommended Query Optimizers GitHub Repositories repository?

Utilizes cost-based optimization to analyze metadata and statistics for generating efficient query execution plans.

Why is citusdata/citus a recommended Query Optimizers GitHub Repositories repository?

Pushes operations to worker nodes based on distribution columns to minimize data movement and maximize parallel computation.

Why is mysql/mysql-server a recommended Query Optimizers GitHub Repositories repository?

Analyzes table statistics and index availability to select the most efficient execution plan for retrieving data from complex relational structures.

Why is tencent/matrix a recommended Query Optimizers GitHub Repositories repository?

Detects full table scans and missing prepared statements in database queries to improve retrieval speed.

Why is manticoresoftware/manticoresearch a recommended Query Optimizers GitHub Repositories repository?

Implements a cost-based optimizer that uses data statistics and secondary indexes to determine the most efficient execution plan.

Why is starrocks/starrocks a recommended Query Optimizers GitHub Repositories repository?

Implements a cost-based optimizer that determines the most efficient execution plan using table statistics.

17 dépôts

Awesome GitHub RepositoriesQuery Optimizers

Utilities for improving database query performance and data retrieval.

Distinguishing note: Focuses on performance tuning for data lists rather than general database management.

Explore 17 awesome GitHub repositories matching data & databases · Query Optimizers. Refine with filters or upvote what's useful.

Trouvez les meilleurs dépôts grâce à l'IA.Nous recherchons les dépôts les plus pertinents grâce à l'IA.

apache/spark
apache/spark
43,467Voir sur GitHub
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Implements a cost-based and rule-based optimizer to transform SQL expressions into efficient physical execution plans.
Scalabig-datajavajdbc
Voir sur GitHub43,467
pola-rs/polars
pola-rs/polars
38,855Voir sur GitHub
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
Optimizes query execution by filtering rows and selecting columns as close to the source as possible.
Rustarrowdataframedataframe-library
Voir sur GitHub38,855
duckdb/duckdb
duckdb/duckdb
38,805Voir sur GitHub
DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation. The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adapti
Dynamically selects efficient execution plans for analytical workloads at runtime.
C++analyticsdatabaseembedded-database
Voir sur GitHub38,805
prestodb/presto
prestodb/presto
16,711Voir sur GitHub
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Provides cost-based query optimization to rewrite execution paths based on table statistics and historical data.
Javabig-datadatahadoop
Voir sur GitHub16,711
trinodb/trino
trinodb/trino
12,952Voir sur GitHub
Trino is a distributed SQL query engine designed for large-scale data analytics. It functions as a data federation platform, providing a unified interface that allows users to execute complex analytical queries across multiple heterogeneous data sources simultaneously without requiring data movement or transformation. The engine utilizes a massively parallel processing architecture to scale compute resources across clusters for high-speed data retrieval. It distinguishes itself through a cost-based query optimizer that analyzes metadata to determine efficient execution plans, alongside dynami
Utilizes cost-based optimization to analyze metadata and statistics for generating efficient query execution plans.
Javaanalyticsbig-datadata-science
Voir sur GitHub12,952
citusdata/citus
citusdata/citus
12,562Voir sur GitHub
Citus is a PostgreSQL extension that transforms a standard database into a distributed system. It functions as a sharding framework and distributed SQL engine, enabling horizontal scaling by partitioning tables across a cluster of nodes. By utilizing a coordinator-worker topology, the system manages metadata and routes queries to the appropriate nodes, allowing for parallel execution of complex operations across distributed data shards. The platform distinguishes itself through its specialized support for multi-tenant architectures and real-time analytical processing. It enables tenant-based
Pushes operations to worker nodes based on distribution columns to minimize data movement and maximize parallel computation.
Ccituscitus-extensiondatabase
Voir sur GitHub12,562
mysql/mysql-server
mysql/mysql-server
12,297Voir sur GitHub
MySQL Server is a relational database management system designed to organize and store structured information. It functions as a comprehensive SQL server platform that provides reliable transactional integrity and high-performance query execution for enterprise data management. The system distinguishes itself through a pluggable storage engine architecture that decouples logical query processing from physical data storage, allowing for specialized handling of diverse workloads. It maintains data consistency and high concurrency through multi-version concurrency control and write-ahead logging
Analyzes table statistics and index availability to select the most efficient execution plan for retrieving data from complex relational structures.
C++
Voir sur GitHub12,297
tencent/matrix
Tencent/matrix
12,020Voir sur GitHub
Matrix is a suite of mobile application performance management and analysis tools. It provides a plugin-based monitoring system for capturing crashes, lags, and memory leaks, alongside a static binary auditor for reducing installation package size and a bytecode instrumentation tool for performance tracking. The project distinguishes itself through native memory debugging and a SQLite query linter that identifies inefficient database patterns. It employs native interception techniques to detect memory leaks and heap corruption without requiring source code recompilation, and uses a custom run
Detects full table scans and missing prepared statements in database queries to improve retrieval speed.
Javaandroidapm-clientwechat
Voir sur GitHub12,020
manticoresoftware/manticoresearch
manticoresoftware/manticoresearch
11,819Voir sur GitHub
Manticoresearch is a high-performance search engine and database designed for indexing and retrieving large datasets. It functions as a full-text search engine, a vector search database, and a SQL-based search database, providing a distributed search cluster architecture. The system provides an alternative to the Elasticsearch stack, offering a compatible API for indexing and searching structured and unstructured data. It distinguishes itself by supporting multiple retrieval methods, including vector matching for similarity search, geospatial queries, and traditional full-text ranking. The p
Implements a cost-based optimizer that uses data statistics and secondary indexes to determine the most efficient execution plan.
C++apibm25cpp
Voir sur GitHub11,819
starrocks/starrocks
StarRocks/starrocks
11,789Voir sur GitHub
StarRocks is a distributed SQL OLAP database engine designed for real-time analytics and high-performance multi-dimensional analysis. It functions as a data lakehouse query engine that enables SQL execution across large datasets and external open table formats without requiring local data imports. The system employs a shared-nothing distributed architecture and utilizes the MySQL protocol to integrate with business intelligence tools. It maintains real-time data consistency through a primary key upsert model and accelerates query response times using vectorized execution and cost-based optimi
Implements a cost-based optimizer that determines the most efficient execution plan using table statistics.
Javaanalyticsbig-datacloudnative
Voir sur GitHub11,789
pingcap/awesome-database-learning
pingcap/awesome-database-learning
10,672Voir sur GitHub
This project is a curated collection of academic papers, books, and technical resources designed for studying the architecture and implementation of database management systems. It serves as a comprehensive educational guide for engineers and researchers looking to understand the fundamental principles behind modern data storage and retrieval. The repository distinguishes itself by providing structured learning paths across critical database domains, including the design of persistent storage engines, the mechanics of query optimization, and the complexities of distributed transaction managem
Offers technical resources on cost-based query optimization strategies using statistical data to determine efficient execution paths.
awesomeawesome-listblogs
Voir sur GitHub10,672
yugabyte/yugabyte-db
yugabyte/yugabyte-db
10,349Voir sur GitHub
YugabyteDB is a distributed SQL database and relational data store designed for horizontal scalability and high availability across multiple nodes or regions. It functions as a cloud-native system that ensures continuous availability and supports PostgreSQL compatible query languages and drivers. The system includes specialized capabilities as a vector database for AI, utilizing high-dimensional indexing to perform similarity searches. It is engineered as a multi-region cloud database that synchronizes data across different geographic locations to maintain global availability. The project co
Provides a cost-based optimizer that analyzes data statistics to select the most efficient query execution plans.
Ccloud-nativecppdatabase
Voir sur GitHub10,349
microsoft/mastering-github-copilot-for-paired-programming
microsoft/Mastering-GitHub-Copilot-for-Paired-Programming
7,976Voir sur GitHub
This project is a collection of educational resources and curricula designed for mastering AI pair programming and prompt engineering. It provides a structured training course and instructional materials for integrating AI assistants into the software development lifecycle. The materials cover the use of large language models to modernize legacy code and translate applications between programming languages. It includes a specific guide for crafting natural language queries to generate code and automate development workflows. The content addresses a broad range of capabilities, including AI-a
Offers techniques for using advanced AI prompting to refine and optimize complex database queries.
Pythoncopilotcsharpdotnet
Voir sur GitHub7,976
apache/hudi
apache/hudi
6,097Voir sur GitHub
Apache Hudi is an open-source table format that brings ACID transactions, incremental processing, and multi-modal indexing to data lakes. It provides atomic commits with snapshot isolation, rollback, and optimistic concurrency control for reliable data lake operations, while supporting upserts, record-level updates, and deletions in large analytical datasets. The project distinguishes itself through a timeline-based architecture that coordinates all write operations, enabling features like time-travel querying, incremental change streaming, and multi-modal query views that include snapshot, i
Serves snapshot queries using only columnar storage for high performance on analytical workloads.
Javaapacheflinkapachehudiapachespark
Voir sur GitHub6,097
apache/hive
apache/hive
6,012Voir sur GitHub
Apache Hive is a SQL-on-Hadoop data warehouse that enables querying and managing petabytes of data stored in distributed storage such as HDFS and cloud storage services. It provides a familiar SQL interface for batch analytics and reporting, supported by a core set of components including the HiveServer2 Thrift service for remote query execution, the Hive Metastore Service for central metadata management, the Hive ACID Transaction Engine for concurrent read-write operations, and the Hive LLAP Interactive Engine for low-latency analytical processing. The WebHCat REST API offers an HTTP interfac
Uses a cost-based optimizer with table statistics and materialized views for query planning.
Javaapachebig-datadatabase
Voir sur GitHub6,012
apache/calcite
apache/calcite
5,139Voir sur GitHub
Calcite est un framework pour l'analyse, l'optimisation et la traduction de requêtes SQL en algèbre relationnelle pour une exécution sur diverses sources de données. Il fonctionne comme un moteur de requête multi-sources, une bibliothèque d'analyse SQL et un optimiseur d'algèbre relationnelle. Le projet fournit un moteur d'optimisation basé sur les coûts qui transforme les plans de requête logiques en plans d'exécution physiques efficaces à l'aide de règles enfichables. Il utilise des adaptateurs de traduction pour convertir les requêtes SQL standard dans les formats natifs de bases de données et systèmes de messagerie externes, permettant la fédération de données sur des systèmes de stockage hétérogènes. Le système couvre le cycle de vie complet des requêtes, incluant l'analyse SQL et la validation par rapport aux schémas, la traduction d'expressions en opérateurs algébriques et la sélection de plans d'exécution efficaces. Il inclut également une interface en ligne de commande pour exécuter des requêtes et gérer les connexions aux sources de données.
Implements a cost-based optimizer that estimates resource costs to select the most efficient physical execution plans.
Java
Voir sur GitHub5,139
h2database/h2database
h2database/h2database
4,607Voir sur GitHub
H2 is a JDBC-compliant relational database management system written in Java. It functions as an embeddable SQL database that can run directly within an application process to remove network latency, or as an in-memory database for high-performance volatile storage. It also includes a web-based console for executing SQL commands and administering schemas. The system is characterized by its flexible deployment modes, including a standalone server mode for remote TCP/IP access and a mixed mode for simultaneous local and remote connectivity. It features a dialect emulation layer and compatibilit
Uses table statistics to determine the most efficient physical execution path for SQL statements.
Javadatabasejavajdbc
Voir sur GitHub4,607

Awesome Query Optimizers GitHub Repositories

apache/spark

pola-rs/polars

duckdb/duckdb

prestodb/presto

trinodb/trino

citusdata/citus

mysql/mysql-server

Tencent/matrix

manticoresoftware/manticoresearch

StarRocks/starrocks

pingcap/awesome-database-learning

yugabyte/yugabyte-db

microsoft/Mastering-GitHub-Copilot-for-Paired-Programming

apache/hudi

apache/hive

apache/calcite

h2database/h2database

Explorer les sous-tags