6 个仓库
Storage engines optimized for analytical queries and large-scale data aggregation using columnar data structures.
Distinguishing note: None available; no candidates provided.
Explore 6 awesome GitHub repositories matching data & databases · Columnar Databases. Refine with filters or upvote what's useful.
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
Provides a high-performance storage engine specifically architected for real-time analytical query execution and large-scale data aggregation.
DuckDB 是一个嵌入式、进程内的分析型 SQL 数据库和 OLAP 数据库管理系统。它作为 Parquet 和 CSV 文件的数据库引擎,允许用户在大型数据集上执行复杂的 SQL 查询,而无需单独的服务器进程。 该系统专为本地分析处理和嵌入式数据科学工作流而设计。它支持直接从磁盘查询和分析 Parquet 和 CSV 文件,无需将数据加载到永久数据库中。 该引擎提供高性能的分析型 SQL 执行,包括对窗口函数和嵌套子查询的支持。它采用列式存储布局和向量化查询执行,以处理大规模数据操作和探索。 该数据库可通过独立的命令行界面以及 Python、R、Java 和 Wasm 的特定语言绑定进行访问。
Implements a storage engine optimized for analytical queries using columnar data structures.
PostHog is a comprehensive product analytics and feature management platform designed to capture, process, and visualize user behavior data. It provides a unified suite for tracking application events, managing feature rollouts, and monitoring system health through session recordings and error tracking. By leveraging a columnar-storage-optimized architecture, the platform enables high-performance aggregation and filtering across massive event datasets. What distinguishes PostHog is its integrated approach to data pipelines and application control. It features a robust event ingestion system t
Persists data in columnar format to enable high-performance aggregation across massive event datasets.
Druid is a distributed columnar store and online analytical processing database designed for real-time analytics. It functions as a SQL analytics platform and a streaming data ingestion engine, allowing for the analysis of large datasets with low latency to support interactive dashboards and high-concurrency operational workloads. The system integrates a streaming data ingestion engine that loads information via batch or streaming processes to enable immediate analysis of arriving data. It provides high-performance analytical processing to execute slice-and-dice queries on massive data volume
Functions as a distributed columnar database for efficient aggregation and retrieval of massive datasets.
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Employs columnar data structures and specialized table formats to accelerate analytical query performance.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Implements a storage engine optimized for analytical queries and efficient disk I/O using columnar data structures.