Why is perspective-dev/perspective a recommended Columnar Analytics GitHub Repositories repository?

Stores strongly typed data in columns to enable high-performance aggregation and filtering of large datasets in memory.

Why is apache/datafusion a recommended Columnar Analytics GitHub Repositories repository?

Processes data in Arrow columnar batches through a streaming pipeline without materializing intermediate results.

Why is paradedb/paradedb a recommended Columnar Analytics GitHub Repositories repository?

Calculates high-performance aggregates, buckets, and facets using specialized columnar storage.

Why is alibaba/alisql a recommended Columnar Analytics GitHub Repositories repository?

Embeds a DuckDB columnar engine to execute analytical SQL queries directly on MySQL tables.

Why is google/perfetto a recommended Columnar Analytics GitHub Repositories repository?

Stores parsed trace events in a column-oriented database for fast analytic queries.

Why is eventual-inc/daft a recommended Columnar Analytics GitHub Repositories repository?

Utilizes vectorized columnar processing on contiguous memory blocks to maximize hardware utilization.

Why is grafana/tempo a recommended Columnar Analytics GitHub Repositories repository?

Organizes trace data into columnar Parquet files to enable efficient attribute filtering and high-performance retrieval.

Why is roaringbitmap/roaringbitmap a recommended Columnar Analytics GitHub Repositories repository?

Functions as an in-memory analytics tool for performing rapid set calculations on compressed data without requiring full decompression.

9 repositorios

Awesome GitHub RepositoriesColumnar Analytics

High-performance analytical queries using columnar storage for aggregations, buckets, and facets.

Distinguishing note: None of the candidates cover database-level analytical processing using columnar storage; most focus on performance metrics or search tools.

Explore 9 awesome GitHub repositories matching data & databases · Columnar Analytics. Refine with filters or upvote what's useful.

Encuentra los mejores repositorios con IA.Buscaremos los repositorios que mejor coincidan usando IA.

perspective-dev/perspective
perspective-dev/perspective
10,981Ver en GitHub
Perspective is a columnar data analytics engine and high-performance visualization component powered by WebAssembly. It provides a system for analyzing and visualizing large or streaming datasets through interactive data grids and charts, utilizing a compiled binary to achieve near-native performance within the browser. The project distinguishes itself through a WebSocket-based data streaming interface and deep Apache Arrow integration, which minimize memory overhead when synchronizing tables between servers and clients. It acts as a remote query proxy capable of translating visualization con
Stores strongly typed data in columns to enable high-performance aggregation and filtering of large datasets in memory.
C++analyticsbidata-visualization
Ver en GitHub10,981
apache/datafusion
apache/datafusion
8,908Ver en GitHub
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
Processes data in Arrow columnar batches through a streaming pipeline without materializing intermediate results.
Rustarrowbig-datadataframe
Ver en GitHub8,908
paradedb/paradedb
paradedb/paradedb
8,370Ver en GitHub
ParadeDB is a database extension that integrates full-text search, vector database capabilities, and real-time analytics directly into a relational engine. It functions as a plugin that adds new storage and query execution capabilities to an existing database architecture. The project distinguishes itself by supporting hybrid search workflows that combine lexical keyword matching with dense and sparse vector similarity in a single query. It utilizes reciprocal rank fusion to merge these ranked result sets and employs logical replication to synchronize data from external instances, removing th
Calculates high-performance aggregates, buckets, and facets using specialized columnar storage.
Rustaggregationsanalyticsbm25
Ver en GitHub8,370
mariadb/server
MariaDB/server
7,196Ver en GitHub
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability su
Executes high-performance analytical queries using columnar storage for efficient aggregations.
C++amazon-web-servicesdatabasefulltext-search
Ver en GitHub7,196
alibaba/alisql
alibaba/AliSQL
5,706Ver en GitHub
AliSQL is a fork of MySQL by Alibaba that extends the relational database management system with enhancements for high performance, scalability, and enterprise-grade availability. It retains the core MySQL identity as a SQL-based database for storing, organizing, and retrieving structured data, while adding optimizations for large-scale transactional and analytical workloads. The project differentiates itself through a set of Alibaba-specific improvements, including a columnar engine for accelerating analytical queries directly on MySQL tables, and a distributed, shared-nothing NDB Cluster en
Embeds a DuckDB columnar engine to execute analytical SQL queries directly on MySQL tables.
C++alisqldatabaseduckdb
Ver en GitHub5,706
google/perfetto
google/perfetto
5,558Ver en GitHub
Perfetto is a platform for system-level performance tracing and analysis on Linux and Android. It combines a high-throughput trace recorder, a SQL-based query engine, and a browser-based visualizer into a single toolchain. The platform covers CPU scheduling and call-stack profiling, native and Java heap memory allocation tracking, GPU and graphics events, and system-wide counters such as CPU frequency and power consumption. The architecture decouples trace recording from offline analysis, using a compact protobuf format for event encoding and columnar storage for efficient SQL queries. The we
Stores parsed trace events in a column-oriented database for fast analytic queries.
C++
Ver en GitHub5,558
eventual-inc/daft
Eventual-Inc/Daft
5,225Ver en GitHub
Daft is a distributed dataframe library and multimodal data processor designed to handle large-scale structured and unstructured data. It functions as a vectorized execution engine that processes tables alongside images, audio, and video, utilizing a unified schema to manage diverse data types. The project distinguishes itself by combining distributed data engineering with large-scale AI inference. It provides an AI data pipeline for batch-optimizing model prompts and generating high-dimensional text embeddings, while utilizing zero-copy memory sharing to execute custom Python functions witho
Utilizes vectorized columnar processing on contiguous memory blocks to maximize hardware utilization.
Rustai-engineeringai-pipelinearrow
Ver en GitHub5,225
grafana/tempo
grafana/tempo
5,079Ver en GitHub
Grafana Tempo is a high-scale distributed tracing backend and columnar trace database. It serves as an observability data store that persists and queries spans and traces using OpenTelemetry standards, allowing for the analysis of request flows across microservices. The system distinguishes itself by using an object-store based backend with columnar Parquet storage. This architecture enables efficient attribute searching and large-scale data retrieval through dedicated attribute columnization and block-based data partitioning. It includes a specialized TraceQL query engine for filtering trace
Organizes trace data into columnar Parquet files to enable efficient attribute filtering and high-performance retrieval.
Godistributed-tracinggrafana
Ver en GitHub5,079
roaringbitmap/roaringbitmap
RoaringBitmap/RoaringBitmap
3,878Ver en GitHub
RoaringBitmap is a Java-based library designed for the memory-efficient storage and high-speed querying of large sets of integers. It functions as an in-memory analytics tool that maintains compact data representations while supporting rapid set calculations, such as intersections, unions, and differences. The library distinguishes itself through a hybrid compression strategy that automatically selects between bitsets, sorted arrays, or run-length encoding based on the density of the data. It utilizes a two-level hierarchical index to provide constant-time random access lookups, ensuring perf
Functions as an in-memory analytics tool for performing rapid set calculations on compressed data without requiring full decompression.
Javabitsetdruidjava
Ver en GitHub3,878

Awesome Columnar Analytics GitHub Repositories

perspective-dev/perspective

apache/datafusion

paradedb/paradedb

MariaDB/server

alibaba/AliSQL

google/perfetto

Eventual-Inc/Daft

grafana/tempo

RoaringBitmap/RoaringBitmap

Explorar subetiquetas