33 dépôts
Engines that process standard SQL commands while maintaining strict consistency and isolation.
Distinguishing note: No candidates provided; focuses on query processing.
Explore 33 awesome GitHub repositories matching data & databases · SQL Query Execution Engines. Refine with filters or upvote what's useful.
DuckDB est une base de données SQL analytique embarquée et un système de gestion de base de données OLAP. Il fonctionne comme un moteur de données pour les fichiers Parquet et CSV, permettant aux utilisateurs d'exécuter des requêtes SQL complexes sur de grands jeux de données sans nécessiter de processus serveur séparé. Le système est conçu pour le traitement analytique local et les flux de travail de science des données embarqués. Il permet l'interrogation et l'analyse directes de fichiers Parquet et CSV depuis le disque, évitant ainsi de devoir charger les données dans une base de données permanente. Le moteur fournit une exécution SQL analytique haute performance, incluant la prise en charge des fonctions de fenêtrage et des sous-requêtes imbriquées. Il intègre une disposition de stockage en colonnes et une exécution de requêtes vectorisées pour gérer la manipulation et l'exploration de données à grande échelle. La base de données est accessible via une interface de ligne de commande autonome et des liaisons spécifiques aux langages Python, R, Java et Wasm.
Provides a high-performance SQL execution engine supporting advanced window functions and nested subqueries.
Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures. The system distinguishes itself through
Processes database commands using standard syntax while maintaining strict data consistency and isolation.
Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools. The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orches
Transforms high-level analytical requests into optimized, dialect-specific SQL queries tailored for diverse underlying database engines.
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Provides a high-performance distributed SQL engine for interactive analytical queries across heterogeneous data sources.
Floci is a local emulator for AWS services and cloud infrastructure designed for developing and testing applications without a live internet connection. It serves as a containerized cloud emulator and a serverless runtime emulator, allowing users to run high-fidelity replicas of cloud databases, queues, and compute services on a local machine. The project distinguishes itself by using real container images instead of simple mocks to ensure behavioral accuracy. It functions as a local API gateway simulator with proxy-based routing for REST and WebSocket APIs, and provides a serverless environm
Provides a local SQL execution engine to run queries against an emulated data lake.
Apache Druid is a real-time analytics database and distributed columnar time-series store designed for sub-second analytical queries. It functions as a data platform featuring a distributed SQL query engine and a real-time data ingestion system for moving historical and streaming data from external sources. The system is distinguished by its ability to provide low-latency analytics under high concurrency to power operational dashboards. It implements a Kerberos-secured environment for user authentication and employs a shared-nothing cluster architecture to enable horizontal scaling. The plat
Processes standard SQL and native queries across a distributed cluster to analyze large-scale datasets.
Perspective is a columnar data analytics engine and high-performance visualization component powered by WebAssembly. It provides a system for analyzing and visualizing large or streaming datasets through interactive data grids and charts, utilizing a compiled binary to achieve near-native performance within the browser. The project distinguishes itself through a WebSocket-based data streaming interface and deep Apache Arrow integration, which minimize memory overhead when synchronizing tables between servers and clients. It acts as a remote query proxy capable of translating visualization con
Translates high-level visualization configurations into native queries for external data engines without loading full datasets into memory.
FerretDB is an open-source database emulator and protocol translator that mimics a MongoDB environment to support existing drivers and client tools on a relational backend. It functions as a stateless database proxy that converts binary wire protocol messages into SQL statements, allowing a relational engine to handle document-oriented requests. The project serves as a migration tool for moving applications from MongoDB to PostgreSQL without rewriting queries or changing client drivers. It achieves this by using PostgreSQL as a document store, storing and querying BSON documents through a tra
Translates MongoDB wire protocol queries into SQL statements for a relational storage engine.
Perspective is a columnar data analytics library and streaming data visualization engine. It provides an interactive data grid component and notebook analytics widgets designed for processing high-volume data and rendering interactive charts and grids. The system utilizes a high-performance query engine to enable real-time data analysis and streaming dataset visualization. It supports the creation of customizable dashboards and reports that update automatically as new data arrives without requiring full dataset reloads. The project covers large-scale dataset analytics through a schema-driven
Translates interface configurations into native queries to interact with pluggable external data engines.
LiteDB is a serverless NoSQL document store and embedded database engine for .NET applications. It persists unstructured documents and binary data into a single standalone disk file, allowing the database to run within the application process rather than as a separate server. The system supports strongly typed queries through Language Integrated Query and allows the execution of standard SQL commands for data retrieval and transformation. It provides native mapping of plain classes into document formats and secures stored information via symmetric-key file encryption. The engine includes cap
Provides an engine to execute standard SQL commands for data retrieval and transformation.
sqlglot is a SQL parser and transpiler that represents queries as abstract syntax trees to enable structural analysis, modification, and semantic transformation. It functions as a dialect translator and query optimizer, converting SQL code between different database engines and simplifying syntax trees through rule-based normalization. The project provides a framework for defining custom SQL dialects by overriding tokenizers, parsers, and generators. It includes a lineage analyzer to track data flow from source tables through complex queries to identify the origin of specific columns. Additi
Interprets and executes SQL queries natively using Python dictionaries as data sources.
TextQL is a command line SQL query engine designed to execute relational queries directly against structured text files, such as CSV and TSV, without requiring a database import. It functions as a relational text file analyzer and a CSV processor that treats plain text files as virtual tables for filtering, joining, and aggregating data. The tool is built as a pipe-compatible data transformation utility, allowing it to process data from standard input and output formatted datasets. It enables relational joins across multiple files or directories within a single query to analyze relationships
Acts as a SQL query execution engine that processes relational commands directly against structured text files.
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Executes structured query language commands to ingest, process, and retrieve real-time data.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Provides a standard SQL interface for exploring and analyzing stored data.
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
Runs as an embedded SQL engine within a host application without requiring a separate server process.
ToyDB is a distributed SQL database that provides a system for storing and querying data across multiple nodes. It focuses on maintaining strong consistency and fault tolerance through the implementation of a distributed consensus algorithm. The project distinguishes itself by supporting historical data versioning, enabling time-travel queries to retrieve the state of the database from a specific point in the past. It utilizes multi-version concurrency control to manage ACID transactions and ensure data integrity during concurrent operations. The system covers relational data modeling with t
Provides an engine that processes standard SQL commands while maintaining strict consistency and isolation.
This project provides a SQL interface for Elasticsearch, serving as a translator and database layer that allows users to retrieve, filter, and manipulate indices using structured query language. It functions by converting standard SQL statements into the native JSON query language used by the search engine. The system includes a geospatial SQL engine for executing location-based searches and distance calculations. It also features a query debugger used to visualize the translation process from SQL to search engine request bodies to verify the logic and accuracy of data retrieval. The capabil
Translates structured SQL statements into the native JSON query language used by the search engine.
Osmedeus is a security workflow orchestration engine that coordinates AI agents, shell commands, and scanning tools through declarative YAML pipelines. It functions as a distributed security scanner, a declarative workflow automator, and an AI agent framework for security, enabling automated multi-step security analysis with conditional branching, parallel execution, and distributed workers. The engine distinguishes itself through a hybrid runner model that executes workflow steps on the local host, inside Docker containers, or over SSH to remote machines, selected per step or module. It supp
Provides functions to extract data from JSON strings or files using jq query syntax.
csvkit is a composable Unix-style command-line toolkit for converting, filtering, and analyzing CSV files directly from the terminal. It provides a suite of focused single-purpose commands that can be combined via pipes to build complex data processing workflows, with a modular architecture that includes a column-type inference engine for automatically detecting data types and a streaming-pipeline design for efficient handling of tabular data. The toolkit distinguishes itself through its SQL-engine abstraction layer, which allows users to run SQL queries directly against CSV files without req
Computes summary statistics and aggregations entirely in memory using Python data structures.
Gitql est un moteur de requête SQL et un outil de recherche de métadonnées conçu pour explorer et récupérer des informations à partir de systèmes de contrôle de version. Il fournit un langage de requête structuré qui permet aux utilisateurs de filtrer et d'extraire des assets de l'historique d'un projet en utilisant une syntaxe formelle au lieu des commandes standard de l'interface en ligne de commande. L'outil fonctionne comme un explorateur de données interactif, doté d'une interface de ligne de commande pour évaluer les données du dépôt en temps réel. Il mappe les entités de contrôle de version, telles que les commits et les tags, vers des tables relationnelles virtuelles, permettant l'exécution de requêtes séquentielles pour auditer l'historique et analyser les métadonnées du dépôt. Le système couvre de larges capacités pour la récupération d'assets de dépôt, incluant l'extraction d'informations à partir de pull requests, d'issues et de discussions. Il inclut également des mécanismes d'exploration de schéma de dépôt pour identifier quelles tables de données sont disponibles pour interrogation.
Executes structured SQL queries against Git repository metadata to filter and retrieve version control information.