17 مستودعات
The process of analyzing database execution plans to optimize query performance.
Distinct from Database Query Execution: Focuses on the visualization and analysis of the plan (joins, index usage) rather than the act of executing the query.
Explore 17 awesome GitHub repositories matching data & databases · Execution Plan Analysis. Refine with filters or upvote what's useful.
Nebula is a distributed graph database designed for storing and querying massive volumes of interconnected vertices and edges across a horizontally scalable cluster. It functions as a Kubernetes-native database and a distributed graph analytics engine, utilizing a Raft-based distributed store to ensure strong consistency and high availability. The system features an OpenCypher query engine for performing complex graph traversals and pattern matching. It distinguishes itself with a decoupled compute-storage architecture and a shared-nothing distributed design, allowing query processing and dat
Provides tools to analyze query execution plans and profiling data to identify and resolve performance bottlenecks.
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
Displays the physical plan and execution metrics of a query using EXPLAIN and EXPLAIN ANALYZE.
Soar is a suite of specialized tools designed for analyzing MySQL performance, advising on indexing, and optimizing SQL syntax. It functions as a performance analyzer, index advisor, and query optimizer to identify bottlenecks and suggest structural improvements for faster execution. The project distinguishes itself through a system for rewriting SQL statements into optimized equivalent versions using custom heuristic rules and patterns. It also features a dedicated index advisor that evaluates query patterns and database metadata to recommend the creation of new indexes. Its broader capabil
Analyzes database execution plans and explain output to detect inefficient access types and key usage.
This project is a comprehensive educational resource and curriculum focused on site reliability engineering, distributed systems, and infrastructure operations. It provides technical guides, a systems engineering course, and instructional manuals designed to teach the principles of managing large-scale computing environments. The curriculum covers high-level architectural design for scalability and resilience, including fault-tolerant infrastructure, high-availability patterns, and microservices decomposition. It emphasizes the practical application of site reliability engineering through the
Teaches how to generate and visualize execution plans to identify bottlenecks in table joins and index usage.
Azure Data Studio is a cross-platform SQL database management IDE used for writing queries, managing schemas, and administering relational databases. It functions as a comprehensive environment for relational database management, providing a structured interface for executing SQL queries and browsing database objects. The platform is distinguished by its interactive data notebooks, which combine executable code cells, narrative text, and visualizations for data analysis. It also includes specialized tools for database migration, allowing users to assess and transfer schemas and data from on-p
Visualizes estimated and actual execution plans graphically to identify expensive operators and optimize performance.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Generates detailed breakdowns of execution steps to help optimize complex joins and distributed data reshaping.
SparkInternals is a technical reference and architecture guide detailing the internal design and implementation of the Apache Spark distributed computing engine. It serves as a study of big data engine analysis, focusing on how the system manages cluster execution and the interaction between driver nodes, executors, and workers. The project provides a detailed breakdown of how logical plans are converted into physical execution stages. It specifically analyzes the mechanics of data shuffle operations, memory management, and the coordination of distributed job scheduling. The documentation co
Analyzes how execution flows are decomposed into jobs and stages to visualize concrete compute operations.
Octosql هو محرك استعلامات SQL موزع، ومحول بيانات، ومعالج SQL للبث المباشر. يتيح للمستخدمين تنفيذ استعلامات SQL واحدة عبر مصادر بيانات متعددة ومتباينة، بما في ذلك أنواع قواعد البيانات المختلفة وتنسيقات الملفات، لدمج النتائج وتحويلها إلى مجموعة بيانات موحدة. يتميز النظام بمعاملة ملفات CSV وJSONLines وParquet كجداول افتراضية، ويستخدم بنية تعتمد على الإضافات (plugins) لتوسيع الاتصال بمحركات التخزين الخارجية. يعمل كمعالج للبث المباشر لتدفقات البيانات غير المحدودة، مستخدماً العلامات المائية (watermarks) وعمليات التراجع (retractions) والنوافذ الزمنية (tumbling windows) للحفاظ على الاتساق في الأحداث غير المرتبة. بالإضافة إلى ذلك، يعمل كمولد بيانات SQL قادر على إنتاج مجموعات بيانات اصطناعية وتدفقات سجلات عبر دوال ذات قيم جدولية. يتضمن المحرك قدرات لربط البيانات عبر مصادر متعددة والتحليل متعدد المصادر، مع تحسين الأداء عبر دفع التنبؤات (predicate push-down) إلى جانب المصدر لتقليل نقل البيانات. يدير النظام البيانات المعقدة من خلال نظام أنواع ثابت (static type system) مع أنواع اتحادية (union types) ويوفر إمكانية المراقبة عبر تصور خطط تنفيذ الاستعلامات.
Generates visual representations of execution plans to verify predicate push-down and optimization logic.
Pigsty هي منصة تنسيق بنية تحتية لقاعدة بيانات شاملة مصممة لأتمتة دورة الحياة الكاملة لمجموعات PostgreSQL عالية التوافر. تعمل كإطار عمل للبنية التحتية ككود يدير تنسيق المجموعة، وتوفير العقد، واكتشاف الخدمة من خلال دفاتر تشغيل متطابقة. من خلال دمج آليات الإجماع الموزعة، تضمن المنصة تجاوز الفشل الآلي وإنفاذ الحالة المتسقة عبر بيئات متنوعة، بما في ذلك الأجهزة المعدنية والبنية التحتية الافتراضية. تتميز المنصة بمجموعة قوية من القدرات التشغيلية التي تمتد إلى ما وراء إدارة قاعدة البيانات القياسية. تتميز بخط أنابيب مراقبة مدمج يجمع المقاييس والسجلات والآثار في لوحات تحكم مركزية لمراقبة الأداء في الوقت الفعلي والتحليل التشخيصي. بالإضافة إلى ذلك، توفر إطار عمل ترحيل يحاكي بروتوكولات الأسلاك المملوكة وصيغة SQL، مما يسمح بدمج أعباء عمل قاعدة بيانات المؤسسات القديمة في بيئات علائقية حديثة. يغطي النظام مساحة وظيفية واسعة، بما في ذلك إدارة التخزين المتقدمة مع استنساخ النسخ عند الكتابة للنشر السريع، وتنسيق قواعد البيانات المتعددة الذي ينسق المحركات العلائقية مع التخزين المؤقت وتخزين الكائنات. كما يدمج تقوية الأمان، والنسخ الاحتياطي والاستعادة الآلي، وتوجيه حركة المرور من خلال وكلاء طبقيين لفصل اتصالات العميل عن طوبولوجيا المجموعة الأساسية. يتم توزيع المشروع كنموذج مرآة حزمة مكتفٍ ذاتياً، مما يتيح النشر المتسق وإدارة التبعية في البيئات الآمنة أو المعزولة.
Displays database execution plans as visual diagrams to help developers identify and resolve performance bottlenecks.
Eko هو إطار عمل لتصميم ونشر سير عمل الوكلاء (agentic workflows)، يتميز بمنسق سير عمل وكلاء LLM ومحرك لأتمتة المتصفح. يوفر مدير عمليات من جانب الخادم لتنفيذ العمليات على مستوى النظام وإدارة الملفات المحلية، إلى جانب وحدة تحكم للوكلاء مع تدخل بشري (human-in-the-loop) للإشراف والتوجيه اليدوي أثناء عمليات اتخاذ القرار المؤتمتة. ينسق النظام تعاون الوكلاء المتعددين من خلال التقسيم القائم على الأدوار وتنسيق سير العمل، حيث يقسم المهام المعقدة إلى أدوار متميزة ويدير تسليم التنفيذ. يدمج بروتوكول سياق النموذج (Model Context Protocol) لتوحيد الاتصالات بين الوكلاء والأدوات أو مصادر البيانات الخارجية. تتضمن المنصة قدرات لأتمتة المتصفح بدون واجهة رسومية (headless)، وكشط الويب، وأتمتة المهام المتكررة باستخدام الاستماع للأحداث القائم على الحلقات. كما يتميز ببث خطط التنفيذ لتصور عملية التخطيط الداخلية للوكيل في الوقت الفعلي.
Features execution plan streaming to visualize an agent's internal planning process in real-time.
Pigsty is a full-stack orchestration suite for deploying, monitoring, and managing high-availability PostgreSQL clusters and their supporting infrastructure. It functions as a cluster management platform and high-availability suite that automates failover, manages virtual IPs, and ensures data consistency through distributed consensus. The project distinguishes itself by providing a comprehensive database infrastructure-as-code framework and a dedicated observability stack. It incorporates a backup and recovery manager supporting point-in-time recovery via S3-compatible object storage, alongs
Renders PostgreSQL EXPLAIN output into a visual format to identify query performance bottlenecks.
H2 is a JDBC-compliant relational database management system written in Java. It functions as an embeddable SQL database that can run directly within an application process to remove network latency, or as an in-memory database for high-performance volatile storage. It also includes a web-based console for executing SQL commands and administering schemas. The system is characterized by its flexible deployment modes, including a standalone server mode for remote TCP/IP access and a mixed mode for simultaneous local and remote connectivity. It features a dialect emulation layer and compatibilit
Inspects internal execution plans and scan counts to optimize index usage and query performance.
The MongoDB Python Driver is a client library and NoSQL database client used to execute CRUD operations and manage data within MongoDB databases using the Python programming language. It serves as a database connectivity library that handles authentication and connection pooling, while also providing a vector search client for managing embedding indexes and retrieving data based on semantic similarity. The driver supports both synchronous and asynchronous database driver models to perform non-blocking I/O operations and stream data from database clusters. It distinguishes itself through speci
Provides access to execution plans and performance statistics to optimize database query performance.
Kvrocks is a distributed key-value store and Redis-compatible NoSQL database. It utilizes a RocksDB storage engine to provide disk-based persistence, allowing for high-capacity data storage with reduced memory costs compared to in-memory systems. The system functions as a vector database and full-text search engine, supporting nearest-neighbor searches on vector embeddings and complex document queries via text matching. It employs a proxyless cluster architecture with slot-based routing to distribute data and scale capacity across multiple nodes. The platform covers a wide range of data mana
Generates and analyzes query execution plans to optimize data retrieval and filtering.
Memgraph is an in-memory, distributed graph database designed for high-performance labeled property graph management. It utilizes a Cypher query engine for declarative data retrieval and manipulation, providing a scalable knowledge graph backend that integrates vector search and graph traversals. The system distinguishes itself as a real-time graph analytics platform, employing native C++ and CUDA implementations to execute complex network analysis and dynamic community detection on streaming data. It provides specialized support for AI integration, including GraphRAG capabilities, the constr
Generates detailed query execution plans to identify and resolve performance bottlenecks.
pgdog is a PostgreSQL sharding proxy, distributed SQL router, and connection pooler. It is designed to enable horizontal data distribution by splitting tables and indices across multiple independent servers to scale storage and processing capacity. The project distinguishes itself through online resharding capabilities, using logical replication to move data between shards without application downtime. It supports multiple routing strategies, including hash, list, and range-based query routing, and manages distributed atomic transactions using a two-phase commit process to ensure consistency
Retrieves and analyzes execution plans for slow queries to assist in performance tuning.
mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified set of tools, prompts, and resources. The project distinguishes itself through a multi-protocol translation bridge that converts communication between standard I/O, SSE, gRPC, and REST to enable interoperability between disparate tool servers. It includes a comprehensive LLM evaluation framework for
Inspects database execution plans and table scan statistics to identify and optimize slow queries.