Arrow

Features

Columnar Formats - Provides a language-independent standard for organizing flat or nested data in memory.
Data Analytics Engines - Functions as a high-performance engine for executing complex queries directly on memory-resident datasets.
Data Format Interoperability - Provides a standardized interface for reading and writing data across diverse file formats like Parquet, ORC, and CSV.
Memory Formats - Structures data into a language-independent columnar format to accelerate analytical operations and improve memory efficiency.
Columnar Data Processors - Organizes large datasets into memory-efficient columnar structures to accelerate analytical queries.
Language-Neutral Data Serialization - Provides a framework for moving large datasets between systems using shared memory to eliminate serialization overhead.
Memory Layouts - Organizes data in contiguous memory blocks to maximize CPU cache efficiency and enable vectorized processing.
Serialization Frameworks - Implements a high-performance framework for serializing and deserializing structured data with schema-driven efficiency.
Vectorized Execution Engines - Operates on batches of data using computational kernels to optimize CPU usage for analytical queries.
Language Interoperability - Standardizes data formats across different programming languages to ensure seamless communication.
In-Process Analytics - Executes complex data queries and processing tasks directly on memory-resident datasets.
Shared Memory Transports - Provides zero-copy communication mechanisms for efficient data access across multiple processes.
Language-Agnostic Runtimes - Provides a standardized memory format that allows different programming languages to read and write data without translation overhead.
Vectorized Array Operations - Performs mathematical operations on entire arrays of data to leverage modern processor instruction sets.
Zero-Copy Mechanisms - Enables multiple processes to access shared memory buffers simultaneously without serialization or duplication overhead.
Data Storage Systems - Provides in-memory columnar data representation.
Data Engineering - Columnar format for fast data interchange.
Developer Tools - Multi-language toolbox for accelerated data interchange.
Serialization Libraries - Cross-language development platform for in-memory data.
Data Import and Export - Ingests and exports information across standard file types like CSV, ORC, and Parquet.
Data Serialization Formats - Defines structured metadata for nested and flat data types to facilitate efficient reading from various file formats.
Schema Metadata Utilities - Manages structured metadata to define data layouts for efficient reading across different systems.

Open-source alternatives to Arrow

Similar open-source projects, ranked by how many features they share with Arrow.

facebookincubator/velox
facebookincubator/velox
4,155View on GitHub
Velox is a high-performance C++ query execution engine and columnar data processing library. It serves as a composable framework for implementing analytical query engines, providing a vectorized expression evaluator and a toolkit for data management systems. The project is distinguished by its use of vectorized columnar execution and arena-based memory allocation to process large-scale datasets. It features specialized optimizations such as broadcast join table caching, dynamic filter push-down, and dictionary encoding to reduce memory overhead and accelerate analytical reads. The engine cov
C++
View on GitHub4,155
apache/pinot
apache/pinot
6,098View on GitHub
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Java
View on GitHub6,098
delta-io/delta
delta-io/delta
8,596View on GitHub
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
Scalaacidanalyticsbig-data
View on GitHub8,596
pola-rs/polars
pola-rs/polars
38,855View on GitHub
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
Rustarrowdataframedataframe-library
View on GitHub38,855

See all 30 alternatives to Arrow

apachearrow

Features

Open-source alternatives to Arrow

facebookincubator/velox

apache/pinot

delta-io/delta

pola-rs/polars

Star history

Open-source alternatives to Arrow

facebookincubator/velox

apache/pinot

delta-io/delta

pola-rs/polars