Velox is a high-performance C++ query execution engine and columnar data processing library. It serves as a composable framework for implementing analytical query engines, providing a vectorized expression evaluator and a toolkit for data management systems. The project is distinguished by its use of vectorized columnar execution and arena-based memory allocation to process large-scale datasets. It features specialized optimizations such as broadcast join table caching, dynamic filter push-down, and dictionary encoding to reduce memory overhead and accelerate analytical reads. The engine cov
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e