6 Repos
Storage engines optimized for analytical queries and large-scale data aggregation using columnar data structures.
Distinguishing note: None available; no candidates provided.
Explore 6 awesome GitHub repositories matching data & databases · Columnar Databases. Refine with filters or upvote what's useful.
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
Provides a high-performance storage engine specifically architected for real-time analytical query execution and large-scale data aggregation.
DuckDB ist eine eingebettete, In-Process analytische SQL-Datenbank und ein OLAP-Datenbankmanagementsystem. Es fungiert als Daten-Engine für Parquet- und CSV-Dateien und ermöglicht es Benutzern, komplexe SQL-Abfragen auf großen Datensätzen auszuführen, ohne einen separaten Serverprozess zu benötigen. Das System ist für lokale analytische Verarbeitung und eingebettete Data-Science-Workflows konzipiert. Es ermöglicht das direkte Abfragen und Analysieren von Parquet- und CSV-Dateien von der Festplatte, wodurch das Laden von Daten in eine permanente Datenbank entfällt. Die Engine bietet eine leistungsstarke analytische SQL-Ausführung, einschließlich Unterstützung für Fensterfunktionen und verschachtelte Unterabfragen. Sie verwendet ein spaltenbasiertes Speicherlayout und vektorisierte Abfrageausführung, um umfangreiche Datenmanipulationen und -analysen zu bewältigen. Die Datenbank ist über eine eigenständige Befehlszeilenschnittstelle sowie sprachspezifische Bindings für Python, R, Java und Wasm zugänglich.
Implements a storage engine optimized for analytical queries using columnar data structures.
PostHog is a comprehensive product analytics and feature management platform designed to capture, process, and visualize user behavior data. It provides a unified suite for tracking application events, managing feature rollouts, and monitoring system health through session recordings and error tracking. By leveraging a columnar-storage-optimized architecture, the platform enables high-performance aggregation and filtering across massive event datasets. What distinguishes PostHog is its integrated approach to data pipelines and application control. It features a robust event ingestion system t
Persists data in columnar format to enable high-performance aggregation across massive event datasets.
Druid is a distributed columnar store and online analytical processing database designed for real-time analytics. It functions as a SQL analytics platform and a streaming data ingestion engine, allowing for the analysis of large datasets with low latency to support interactive dashboards and high-concurrency operational workloads. The system integrates a streaming data ingestion engine that loads information via batch or streaming processes to enable immediate analysis of arriving data. It provides high-performance analytical processing to execute slice-and-dice queries on massive data volume
Functions as a distributed columnar database for efficient aggregation and retrieval of massive datasets.
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Employs columnar data structures and specialized table formats to accelerate analytical query performance.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Implements a storage engine optimized for analytical queries and efficient disk I/O using columnar data structures.