The visitor is looking for a high-performance, embeddable analytical database engine designed to query local datasets directly without a dedicated server.

cwida/duckdb is the closest match — DuckDB is a purpose-built, in-process analytical database that provides columnar storage, SQL support, and high-performance vectorized execution for querying local files like Parquet and CSV without a server.. Other strong matches: duckdb/duckdb, clickhouse/clickhouse, vaexio/vaex, lmdb/lmdb.

Why does cwida/duckdb match “an engine to query CSVs and Parquet locally”?

DuckDB is a purpose-built, in-process analytical database that provides columnar storage, SQL support, and high-performance vectorized execution for querying local files like Parquet and CSV without a server.

Why does duckdb/duckdb match “an engine to query CSVs and Parquet locally”?

DuckDB is a purpose-built, in-process analytical database that provides columnar storage, SQL support, and zero-copy data access, making it a comprehensive solution for high-performance local data querying.

Why does clickhouse/clickhouse match “an engine to query CSVs and Parquet locally”?

ClickHouse is a high-performance, columnar analytical database that supports embedded usage and direct file querying, though it is primarily architected as a large-scale distributed server rather than a lightweight, library-only engine.

Why does vaexio/vaex match “an engine to query CSVs and Parquet locally”?

Vaex is a high-performance, memory-mapped DataFrame library that provides the core analytical capabilities and out-of-core processing required for querying large local datasets, though it functions as a Python-based computation engine rather than a traditional SQL-based database.

Why does lmdb/lmdb match “an engine to query CSVs and Parquet locally”?

LMDB is a high-performance, memory-mapped key-value store, but it lacks the columnar storage and native SQL support required for an analytical database engine.

Fast In-Process Analytics Engines

High-performance libraries for executing complex analytical queries directly against local data files and datasets.

Find the best repos with AI.We'll search the best matching repositories with AI.

cwida/duckdb
cwida/duckdb
38,822View on GitHub
DuckDB is an embedded, in-process analytical SQL database and OLAP database management system. It functions as a data engine for Parquet and CSV files, allowing users to execute complex SQL queries on large datasets without requiring a separate server process. The system is designed for local analytical processing and embedded data science workflows. It enables the direct querying and analysis of Parquet and CSV files from disk, bypassing the need to load data into a permanent database. The engine provides high-performance analytical SQL execution, including support for window functions and
DuckDB is a purpose-built, in-process analytical database that provides columnar storage, SQL support, and high-performance vectorized execution for querying local files like Parquet and CSV without a server.
C++Embedded DatabasesIn-Process AnalyticsColumnar Databases
View on GitHub38,822
duckdb/duckdb
duckdb/duckdb
38,805View on GitHub
DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation. The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adapti
DuckDB is a purpose-built, in-process analytical database that provides columnar storage, SQL support, and zero-copy data access, making it a comprehensive solution for high-performance local data querying.
C++Embedded DatabasesIn-Process AnalyticsColumnar Engines
View on GitHub38,805
clickhouse/clickhouse
ClickHouse/ClickHouse
48,229View on GitHub
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
ClickHouse is a high-performance, columnar analytical database that supports embedded usage and direct file querying, though it is primarily architected as a large-scale distributed server rather than a lightweight, library-only engine.
C++Columnar Storage EnginesEmbedded Database EnginesQuery Optimization Engines
View on GitHub48,229
vaexio/vaex
vaexio/vaex
8,506View on GitHub
Vaex is a high-performance Apache Arrow DataFrame library and out-of-core data processing engine designed to handle billion-row tabular datasets in Python. It functions as a lazy evaluation framework that defers computations and transformations until results are required, enabling the processing of datasets that exceed available system RAM by mapping files directly from disk. The project distinguishes itself as a tool for big data visualization and exploration, specifically integrated for use within interactive notebooks. It provides specialized capabilities for machine learning feature engin
Vaex is a high-performance, memory-mapped DataFrame library that provides the core analytical capabilities and out-of-core processing required for querying large local datasets, though it functions as a Python-based computation engine rather than a traditional SQL-based database.
PythonMemory-Mapped File AccessMemory-Mapped File Access
View on GitHub8,506
lmdb/lmdb
LMDB/lmdb
2,907View on GitHub
LMDB is an embedded key-value storage engine that provides ACID-compliant data persistence. It is a memory-mapped database that utilizes B+ trees to store key-value pairs, ensuring atomicity, consistency, isolation, and durability. The engine maps files directly into the virtual address space to minimize data copying and system calls. This approach enables high-performance local caching and low-latency data access, specifically optimizing for read-heavy database workflows. The system implements a transactional model with copy-on-write versioning and single-writer multi-reader locking. These
LMDB is a high-performance, memory-mapped key-value store, but it lacks the columnar storage and native SQL support required for an analytical database engine.
CMemory-Mapped File AccessMemory-Mapped Storage
View on GitHub2,907
lancedb/lancedb
lancedb/lancedb
9,031View on GitHub
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
LanceDB is an embeddable, columnar-based storage engine that supports SQL-like filtering and zero-copy data access, making it a strong fit for high-performance local analytical querying despite its primary focus on vector search.
HTMLAnalytical Query EnginesColumnar Storage EnginesZero-Copy Data Access
View on GitHub9,031
pola-rs/polars
pola-rs/polars
38,855View on GitHub
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
Polars is a high-performance, in-process columnar data processing engine that provides the requested analytical capabilities and zero-copy memory architecture, though it is primarily used as a dataframe library rather than a traditional SQL-based database engine.
RustColumnar Storage Engines
View on GitHub38,855
openhft/chronicle-queue
OpenHFT/Chronicle-Queue
3,692View on GitHub
Chronicle Queue is a high-performance data handling system featuring off-heap message queues, memory-mapped file stores, and replicated message stores. It provides a binary compatible memory layout that enables different programming languages to share data without serialization overhead. The system utilizes a replicated message store to synchronize data across multiple nodes, ensuring high availability and instant failover. Its memory-mapped architecture supports deterministic replay from disk and low-latency data recording. The project implements off-heap memory management and zero-allocati
This is a high-performance, low-latency messaging and event-sourcing library rather than an analytical database engine, as it lacks SQL support and columnar query capabilities.
JavaMemory-Mapped File AccessMemory-Mapped Storage
View on GitHub3,692

Fast In-Process Analytics Engines

cwida/duckdb

duckdb/duckdb

ClickHouse/ClickHouse

vaexio/vaex

LMDB/lmdb

lancedb/lancedb

pola-rs/polars

OpenHFT/Chronicle-Queue