33 个仓库
Engines that process standard SQL commands while maintaining strict consistency and isolation.
Distinguishing note: No candidates provided; focuses on query processing.
Explore 33 awesome GitHub repositories matching data & databases · SQL Query Execution Engines. Refine with filters or upvote what's useful.
DuckDB 是一个嵌入式、进程内的分析型 SQL 数据库和 OLAP 数据库管理系统。它作为 Parquet 和 CSV 文件的数据库引擎,允许用户在大型数据集上执行复杂的 SQL 查询,而无需单独的服务器进程。 该系统专为本地分析处理和嵌入式数据科学工作流而设计。它支持直接从磁盘查询和分析 Parquet 和 CSV 文件,无需将数据加载到永久数据库中。 该引擎提供高性能的分析型 SQL 执行,包括对窗口函数和嵌套子查询的支持。它采用列式存储布局和向量化查询执行,以处理大规模数据操作和探索。 该数据库可通过独立的命令行界面以及 Python、R、Java 和 Wasm 的特定语言绑定进行访问。
Provides a high-performance SQL execution engine supporting advanced window functions and nested subqueries.
Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures. The system distinguishes itself through
Processes database commands using standard syntax while maintaining strict data consistency and isolation.
Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools. The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orches
Transforms high-level analytical requests into optimized, dialect-specific SQL queries tailored for diverse underlying database engines.
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Provides a high-performance distributed SQL engine for interactive analytical queries across heterogeneous data sources.
Floci is a local emulator for AWS services and cloud infrastructure designed for developing and testing applications without a live internet connection. It serves as a containerized cloud emulator and a serverless runtime emulator, allowing users to run high-fidelity replicas of cloud databases, queues, and compute services on a local machine. The project distinguishes itself by using real container images instead of simple mocks to ensure behavioral accuracy. It functions as a local API gateway simulator with proxy-based routing for REST and WebSocket APIs, and provides a serverless environm
Provides a local SQL execution engine to run queries against an emulated data lake.
Apache Druid is a real-time analytics database and distributed columnar time-series store designed for sub-second analytical queries. It functions as a data platform featuring a distributed SQL query engine and a real-time data ingestion system for moving historical and streaming data from external sources. The system is distinguished by its ability to provide low-latency analytics under high concurrency to power operational dashboards. It implements a Kerberos-secured environment for user authentication and employs a shared-nothing cluster architecture to enable horizontal scaling. The plat
Processes standard SQL and native queries across a distributed cluster to analyze large-scale datasets.
Perspective is a columnar data analytics engine and high-performance visualization component powered by WebAssembly. It provides a system for analyzing and visualizing large or streaming datasets through interactive data grids and charts, utilizing a compiled binary to achieve near-native performance within the browser. The project distinguishes itself through a WebSocket-based data streaming interface and deep Apache Arrow integration, which minimize memory overhead when synchronizing tables between servers and clients. It acts as a remote query proxy capable of translating visualization con
Translates high-level visualization configurations into native queries for external data engines without loading full datasets into memory.
FerretDB is an open-source database emulator and protocol translator that mimics a MongoDB environment to support existing drivers and client tools on a relational backend. It functions as a stateless database proxy that converts binary wire protocol messages into SQL statements, allowing a relational engine to handle document-oriented requests. The project serves as a migration tool for moving applications from MongoDB to PostgreSQL without rewriting queries or changing client drivers. It achieves this by using PostgreSQL as a document store, storing and querying BSON documents through a tra
Translates MongoDB wire protocol queries into SQL statements for a relational storage engine.
Perspective is a columnar data analytics library and streaming data visualization engine. It provides an interactive data grid component and notebook analytics widgets designed for processing high-volume data and rendering interactive charts and grids. The system utilizes a high-performance query engine to enable real-time data analysis and streaming dataset visualization. It supports the creation of customizable dashboards and reports that update automatically as new data arrives without requiring full dataset reloads. The project covers large-scale dataset analytics through a schema-driven
Translates interface configurations into native queries to interact with pluggable external data engines.
LiteDB is a serverless NoSQL document store and embedded database engine for .NET applications. It persists unstructured documents and binary data into a single standalone disk file, allowing the database to run within the application process rather than as a separate server. The system supports strongly typed queries through Language Integrated Query and allows the execution of standard SQL commands for data retrieval and transformation. It provides native mapping of plain classes into document formats and secures stored information via symmetric-key file encryption. The engine includes cap
Provides an engine to execute standard SQL commands for data retrieval and transformation.
sqlglot is a SQL parser and transpiler that represents queries as abstract syntax trees to enable structural analysis, modification, and semantic transformation. It functions as a dialect translator and query optimizer, converting SQL code between different database engines and simplifying syntax trees through rule-based normalization. The project provides a framework for defining custom SQL dialects by overriding tokenizers, parsers, and generators. It includes a lineage analyzer to track data flow from source tables through complex queries to identify the origin of specific columns. Additi
Interprets and executes SQL queries natively using Python dictionaries as data sources.
TextQL is a command line SQL query engine designed to execute relational queries directly against structured text files, such as CSV and TSV, without requiring a database import. It functions as a relational text file analyzer and a CSV processor that treats plain text files as virtual tables for filtering, joining, and aggregating data. The tool is built as a pipe-compatible data transformation utility, allowing it to process data from standard input and output formatted datasets. It enables relational joins across multiple files or directories within a single query to analyze relationships
Acts as a SQL query execution engine that processes relational commands directly against structured text files.
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Executes structured query language commands to ingest, process, and retrieve real-time data.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Provides a standard SQL interface for exploring and analyzing stored data.
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
Runs as an embedded SQL engine within a host application without requiring a separate server process.
ToyDB is a distributed SQL database that provides a system for storing and querying data across multiple nodes. It focuses on maintaining strong consistency and fault tolerance through the implementation of a distributed consensus algorithm. The project distinguishes itself by supporting historical data versioning, enabling time-travel queries to retrieve the state of the database from a specific point in the past. It utilizes multi-version concurrency control to manage ACID transactions and ensure data integrity during concurrent operations. The system covers relational data modeling with t
Provides an engine that processes standard SQL commands while maintaining strict consistency and isolation.
This project provides a SQL interface for Elasticsearch, serving as a translator and database layer that allows users to retrieve, filter, and manipulate indices using structured query language. It functions by converting standard SQL statements into the native JSON query language used by the search engine. The system includes a geospatial SQL engine for executing location-based searches and distance calculations. It also features a query debugger used to visualize the translation process from SQL to search engine request bodies to verify the logic and accuracy of data retrieval. The capabil
Translates structured SQL statements into the native JSON query language used by the search engine.
Osmedeus is a security workflow orchestration engine that coordinates AI agents, shell commands, and scanning tools through declarative YAML pipelines. It functions as a distributed security scanner, a declarative workflow automator, and an AI agent framework for security, enabling automated multi-step security analysis with conditional branching, parallel execution, and distributed workers. The engine distinguishes itself through a hybrid runner model that executes workflow steps on the local host, inside Docker containers, or over SSH to remote machines, selected per step or module. It supp
Provides functions to extract data from JSON strings or files using jq query syntax.
csvkit is a composable Unix-style command-line toolkit for converting, filtering, and analyzing CSV files directly from the terminal. It provides a suite of focused single-purpose commands that can be combined via pipes to build complex data processing workflows, with a modular architecture that includes a column-type inference engine for automatically detecting data types and a streaming-pipeline design for efficient handling of tabular data. The toolkit distinguishes itself through its SQL-engine abstraction layer, which allows users to run SQL queries directly against CSV files without req
Computes summary statistics and aggregations entirely in memory using Python data structures.
Gitql 是一个 SQL 查询引擎和元数据搜索工具,旨在探索和检索版本控制系统中的信息。它提供了一种结构化查询语言,允许用户使用正式语法而不是标准命令行界面命令来过滤和提取项目历史中的资产。 该工具作为一个交互式数据浏览器,具有用于实时评估仓库数据的命令提示符界面。它将版本控制实体(如提交和标签)映射到虚拟关系表,从而能够执行顺序查询以审计历史并分析仓库元数据。 该系统涵盖了广泛的仓库资产检索功能,包括从拉取请求、议题和讨论中提取信息。它还包括用于仓库模式探索的机制,以识别哪些数据表可供查询。
Executes structured SQL queries against Git repository metadata to filter and retrieve version control information.