6 repository-uri
Storage engines optimized for analytical queries and large-scale data aggregation using columnar data structures.
Distinguishing note: None available; no candidates provided.
Explore 6 awesome GitHub repositories matching data & databases · Columnar Databases. Refine with filters or upvote what's useful.
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
Provides a high-performance storage engine specifically architected for real-time analytical query execution and large-scale data aggregation.
DuckDB este o bază de date SQL analitică încorporată, în proces, și un sistem de gestionare a bazelor de date OLAP. Funcționează ca un motor de date pentru fișiere Parquet și CSV, permițând utilizatorilor să execute interogări SQL complexe pe seturi de date mari fără a necesita un proces de server separat. Sistemul este conceput pentru procesare analitică locală și fluxuri de lucru de știință a datelor încorporate. Acesta permite interogarea și analiza directă a fișierelor Parquet și CSV de pe disc, evitând necesitatea de a încărca datele într-o bază de date permanentă. Motorul oferă execuție SQL analitică de înaltă performanță, inclusiv suport pentru funcții de fereastră și subinterogări imbricate. Acesta încorporează un layout de stocare pe coloane și execuție vectorială a interogărilor pentru a gestiona manipularea și explorarea datelor la scară largă. Baza de date este accesibilă printr-o interfață de linie de comandă autonomă și binding-uri specifice limbajelor pentru Python, R, Java și Wasm.
Implements a storage engine optimized for analytical queries using columnar data structures.
PostHog is a comprehensive product analytics and feature management platform designed to capture, process, and visualize user behavior data. It provides a unified suite for tracking application events, managing feature rollouts, and monitoring system health through session recordings and error tracking. By leveraging a columnar-storage-optimized architecture, the platform enables high-performance aggregation and filtering across massive event datasets. What distinguishes PostHog is its integrated approach to data pipelines and application control. It features a robust event ingestion system t
Persists data in columnar format to enable high-performance aggregation across massive event datasets.
Druid is a distributed columnar store and online analytical processing database designed for real-time analytics. It functions as a SQL analytics platform and a streaming data ingestion engine, allowing for the analysis of large datasets with low latency to support interactive dashboards and high-concurrency operational workloads. The system integrates a streaming data ingestion engine that loads information via batch or streaming processes to enable immediate analysis of arriving data. It provides high-performance analytical processing to execute slice-and-dice queries on massive data volume
Functions as a distributed columnar database for efficient aggregation and retrieval of massive datasets.
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Employs columnar data structures and specialized table formats to accelerate analytical query performance.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Implements a storage engine optimized for analytical queries and efficient disk I/O using columnar data structures.