17 Repos
The process of analyzing database execution plans to optimize query performance.
Distinct from Database Query Execution: Focuses on the visualization and analysis of the plan (joins, index usage) rather than the act of executing the query.
Explore 17 awesome GitHub repositories matching data & databases · Execution Plan Analysis. Refine with filters or upvote what's useful.
Nebula is a distributed graph database designed for storing and querying massive volumes of interconnected vertices and edges across a horizontally scalable cluster. It functions as a Kubernetes-native database and a distributed graph analytics engine, utilizing a Raft-based distributed store to ensure strong consistency and high availability. The system features an OpenCypher query engine for performing complex graph traversals and pattern matching. It distinguishes itself with a decoupled compute-storage architecture and a shared-nothing distributed design, allowing query processing and dat
Provides tools to analyze query execution plans and profiling data to identify and resolve performance bottlenecks.
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
Displays the physical plan and execution metrics of a query using EXPLAIN and EXPLAIN ANALYZE.
Soar is a suite of specialized tools designed for analyzing MySQL performance, advising on indexing, and optimizing SQL syntax. It functions as a performance analyzer, index advisor, and query optimizer to identify bottlenecks and suggest structural improvements for faster execution. The project distinguishes itself through a system for rewriting SQL statements into optimized equivalent versions using custom heuristic rules and patterns. It also features a dedicated index advisor that evaluates query patterns and database metadata to recommend the creation of new indexes. Its broader capabil
Analyzes database execution plans and explain output to detect inefficient access types and key usage.
This project is a comprehensive educational resource and curriculum focused on site reliability engineering, distributed systems, and infrastructure operations. It provides technical guides, a systems engineering course, and instructional manuals designed to teach the principles of managing large-scale computing environments. The curriculum covers high-level architectural design for scalability and resilience, including fault-tolerant infrastructure, high-availability patterns, and microservices decomposition. It emphasizes the practical application of site reliability engineering through the
Teaches how to generate and visualize execution plans to identify bottlenecks in table joins and index usage.
Azure Data Studio is a cross-platform SQL database management IDE used for writing queries, managing schemas, and administering relational databases. It functions as a comprehensive environment for relational database management, providing a structured interface for executing SQL queries and browsing database objects. The platform is distinguished by its interactive data notebooks, which combine executable code cells, narrative text, and visualizations for data analysis. It also includes specialized tools for database migration, allowing users to assess and transfer schemas and data from on-p
Visualizes estimated and actual execution plans graphically to identify expensive operators and optimize performance.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Generates detailed breakdowns of execution steps to help optimize complex joins and distributed data reshaping.
SparkInternals ist ein technisches Referenz- und Architekturhandbuch, das das interne Design und die Implementierung der verteilten Computing-Engine Apache Spark detailliert beschreibt. Es dient als Analyse von Big-Data-Engines und konzentriert sich darauf, wie das System die Cluster-Ausführung sowie das Zusammenspiel zwischen Driver-Nodes, Executors und Workern verwaltet. Das Projekt bietet eine detaillierte Aufschlüsselung, wie logische Pläne in physische Ausführungsstufen konvertiert werden. Es analysiert spezifisch die Mechanik von Data-Shuffle-Operationen, Speicherverwaltung und die Koordination der verteilten Job-Planung. Die Dokumentation deckt ein breites Spektrum an verteilten Computing-Funktionen ab, einschließlich Query-Execution-Planung, Datenabhängigkeitsmanagement und In-Memory-Caching-Strategien. Zudem werden Aufgabenverteilung, parallele Ausführung sowie Prozesse zur Fehlerwiederherstellung und Datenpersistenz untersucht.
Analyzes how execution flows are decomposed into jobs and stages to visualize concrete compute operations.
Octosql ist eine föderierte SQL-Query-Engine, ein Datentransformer und ein Streaming-SQL-Prozessor. Es ermöglicht die Ausführung einzelner SQL-Statements über mehrere heterogene Datenquellen hinweg – einschließlich verschiedener Datenbanktypen und Dateiformate –, um Ergebnisse zu einem einheitlichen Datensatz zusammenzuführen und zu transformieren. Das System zeichnet sich dadurch aus, dass es CSV-, JSONLines- und Parquet-Dateien als virtuelle Tabellen behandelt und eine Plugin-basierte Architektur nutzt, um die Konnektivität zu externen Speichersystemen zu erweitern. Es fungiert als Streaming-Prozessor für unendliche Datenströme und verwendet Watermarks, Retractions und Tumbling Windows, um die Konsistenz bei ungeordneten Ereignissen zu wahren. Zudem dient es als SQL-Datengenerator, der synthetische Datensätze und Record-Streams über tabellenwertige Funktionen erzeugen kann. Die Engine umfasst Funktionen für Cross-Source-Joins und Multi-Source-Analysen, die durch Source-Side Predicate Push-down optimiert werden, um den Datentransfer zu reduzieren. Sie verwaltet komplexe Daten über ein statisches Typsystem mit Union-Types und bietet Observability durch die Visualisierung von Query-Ausführungsplänen.
Generates visual representations of execution plans to verify predicate push-down and optimization logic.
Pigsty is a comprehensive database infrastructure orchestration platform designed to automate the full lifecycle of high-availability PostgreSQL clusters. It functions as an infrastructure-as-code framework that manages cluster coordination, node provisioning, and service discovery through idempotent playbooks. By integrating distributed consensus mechanisms, the platform ensures automated failover and consistent state enforcement across diverse environments, including bare metal and virtualized infrastructure. The platform distinguishes itself through a robust suite of operational capabiliti
Displays database execution plans as visual diagrams to help developers identify and resolve performance bottlenecks.
Eko ist ein Framework für das Design und die Bereitstellung von agentenbasierten Workflows, das einen LLM-Agent-Workflow-Orchestrator und eine Browser-Automatisierungs-Engine umfasst. Es bietet einen serverseitigen Prozessmanager für die Ausführung von Systemoperationen und die Verwaltung lokaler Dateien sowie einen Human-in-the-Loop-Agent-Controller für manuelle Aufsicht und Steuerung während automatisierter Entscheidungsprozesse. Das System koordiniert die Zusammenarbeit mehrerer Agenten durch rollenbasierte Partitionierung und Workflow-Orchestrierung, wobei komplexe Aufgaben in verschiedene Rollen unterteilt und Ausführungsübergaben verwaltet werden. Es integriert das Model Context Protocol, um Verbindungen zwischen Agenten und externen Tools oder Datenquellen zu standardisieren. Die Plattform umfasst Funktionen für Headless-Browser-Automatisierung, Web-Scraping und die Automatisierung repetitiver Aufgaben mittels schleifenbasierter Event-Listener. Sie bietet zudem ein Streaming von Ausführungsplänen, um den internen Planungsprozess eines Agenten in Echtzeit zu visualisieren.
Features execution plan streaming to visualize an agent's internal planning process in real-time.
Pigsty is a full-stack orchestration suite for deploying, monitoring, and managing high-availability PostgreSQL clusters and their supporting infrastructure. It functions as a cluster management platform and high-availability suite that automates failover, manages virtual IPs, and ensures data consistency through distributed consensus. The project distinguishes itself by providing a comprehensive database infrastructure-as-code framework and a dedicated observability stack. It incorporates a backup and recovery manager supporting point-in-time recovery via S3-compatible object storage, alongs
Renders PostgreSQL EXPLAIN output into a visual format to identify query performance bottlenecks.
H2 ist ein JDBC-konformes relationales Datenbankmanagementsystem, das in Java geschrieben ist. Es fungiert als einbettbare SQL-Datenbank, die direkt innerhalb eines Anwendungsprozesses ausgeführt werden kann, um Netzwerklatenz zu eliminieren, oder als In-Memory-Datenbank für performante, flüchtige Speicherung. Es enthält zudem eine webbasierte Konsole zur Ausführung von SQL-Befehlen und zur Verwaltung von Schemata. Das System zeichnet sich durch flexible Bereitstellungsmodi aus, einschließlich eines Standalone-Server-Modus für Remote-TCP/IP-Zugriffe und eines gemischten Modus für gleichzeitige lokale und Remote-Konnektivität. Es verfügt über eine Dialekt-Emulationsschicht und Kompatibilitätsmodi, die es ermöglichen, das Verhalten und die Syntax anderer Datenbanksysteme nachzuahmen. Die Engine bietet ein breites Spektrum an Funktionen, darunter ACID-Transaktionen mit Multi-Version Concurrency Control (MVCC), Unterstützung für Geodaten und JSON sowie fortgeschrittene analytische Fensterfunktionen. Es enthält Tools zur Datensicherung durch komprimierte Backups, SQL-Skript-Wiederherstellung und Off-Heap-Speicherverwaltung für große Datensätze. Die Datenbank lässt sich über Standard-JDBC-Treiber und Verbindungs-URLs in Anwendungen integrieren.
Inspects internal execution plans and scan counts to optimize index usage and query performance.
The MongoDB Python Driver is a client library and NoSQL database client used to execute CRUD operations and manage data within MongoDB databases using the Python programming language. It serves as a database connectivity library that handles authentication and connection pooling, while also providing a vector search client for managing embedding indexes and retrieving data based on semantic similarity. The driver supports both synchronous and asynchronous database driver models to perform non-blocking I/O operations and stream data from database clusters. It distinguishes itself through speci
Provides access to execution plans and performance statistics to optimize database query performance.
Kvrocks is a distributed key-value store and Redis-compatible NoSQL database. It utilizes a RocksDB storage engine to provide disk-based persistence, allowing for high-capacity data storage with reduced memory costs compared to in-memory systems. The system functions as a vector database and full-text search engine, supporting nearest-neighbor searches on vector embeddings and complex document queries via text matching. It employs a proxyless cluster architecture with slot-based routing to distribute data and scale capacity across multiple nodes. The platform covers a wide range of data mana
Generates and analyzes query execution plans to optimize data retrieval and filtering.
Memgraph is an in-memory, distributed graph database designed for high-performance labeled property graph management. It utilizes a Cypher query engine for declarative data retrieval and manipulation, providing a scalable knowledge graph backend that integrates vector search and graph traversals. The system distinguishes itself as a real-time graph analytics platform, employing native C++ and CUDA implementations to execute complex network analysis and dynamic community detection on streaming data. It provides specialized support for AI integration, including GraphRAG capabilities, the constr
Generates detailed query execution plans to identify and resolve performance bottlenecks.
pgdog is a PostgreSQL sharding proxy, distributed SQL router, and connection pooler. It is designed to enable horizontal data distribution by splitting tables and indices across multiple independent servers to scale storage and processing capacity. The project distinguishes itself through online resharding capabilities, using logical replication to move data between shards without application downtime. It supports multiple routing strategies, including hash, list, and range-based query routing, and manages distributed atomic transactions using a two-phase commit process to ensure consistency
Retrieves and analyzes execution plans for slow queries to assist in performance tuning.
mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified set of tools, prompts, and resources. The project distinguishes itself through a multi-protocol translation bridge that converts communication between standard I/O, SSE, gRPC, and REST to enable interoperability between disparate tool servers. It includes a comprehensive LLM evaluation framework for
Inspects database execution plans and table scan statistics to identify and optimize slow queries.