What are the best open-source alternatives to Arrow?

30 open-source projects similar to apache/arrow, ranked by shared features. Top picks: facebookincubator/velox, apache/pinot, delta-io/delta, pola-rs/polars, cwida/duckdb, apache/fory, dask/dask, oracle/graal, graalvm/graal, duckdb/duckdb.

Is facebookincubator/velox a good alternative to Arrow?

Velox is a high-performance C++ query execution engine and columnar data processing library. It serves as a composable framework for implementing analytical query engines, providing a vectorized expression evaluator and a toolkit for data management systems. The project is distinguished by its use…

Is apache/pinot a good alternative to Arrow?

Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system arch…

Is delta-io/delta a good alternative to Arrow?

Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compu…

Is pola-rs/polars a good alternative to Arrow?

Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorize…

Is cwida/duckdb a good alternative to Arrow?

DuckDB is an embedded, in-process analytical SQL database and OLAP database management system. It functions as a data engine for Parquet and CSV files, allowing users to execute complex SQL queries on large datasets without requiring a separate server process. The system is designed for local anal…

Is apache/fory a good alternative to Arrow?

Fory is a cross-language serialization framework and binary data serializer designed to convert complex object graphs into a compact binary format for high-performance data exchange. It includes an IDL-based schema compiler to transform interface definition language files into type-safe native data…

Is dask/dask a good alternative to Arrow?

Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acy…

Is oracle/graal a good alternative to Arrow?

GraalVM is a polyglot virtual machine and high-performance runtime designed to execute multiple programming languages within a single environment. It functions as a JVM language toolkit for building language implementations, a native image compiler for transforming bytecode into standalone binaries…

Is graalvm/graal a good alternative to Arrow?

Graal is a compiler and runtime architecture designed for high-performance execution and polyglot interoperability. It utilizes a graph-based representation of program logic to perform global optimizations and JIT compilation. The project features a meta-circular interpretation framework and a spe…

Is duckdb/duckdb a good alternative to Arrow?

DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex…

Back to apache/arrow

Open-source alternatives to Arrow

30 open-source projects similar to apache/arrow, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Arrow alternative.

facebookincubator/velox
facebookincubator/velox
4,155View on GitHub
Velox is a high-performance C++ query execution engine and columnar data processing library. It serves as a composable framework for implementing analytical query engines, providing a vectorized expression evaluator and a toolkit for data management systems. The project is distinguished by its use of vectorized columnar execution and arena-based memory allocation to process large-scale datasets. It features specialized optimizations such as broadcast join table caching, dynamic filter push-down, and dictionary encoding to reduce memory overhead and accelerate analytical reads. The engine cov
C++
View on GitHub4,155
apache/pinot
apache/pinot
6,098View on GitHub
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Java
View on GitHub6,098
delta-io/delta
delta-io/delta
8,596View on GitHub
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
Scalaacidanalyticsbig-data
View on GitHub8,596

Open-source alternatives to Arrow

facebookincubator/velox

apache/pinot

delta-io/delta

pola-rs/polars

cwida/duckdb

apache/fory

dask/dask

oracle/graal

graalvm/graal

duckdb/duckdb

rougier/numpy-100

toon-format/toon

apache/thrift

dusty-nv/jetson-inference

trinodb/trino

cupy/cupy

jpmorganchase/python-training

iii-hq/iii

MessagePack-CSharp/MessagePack-CSharp

google/flatbuffers

lijin-THU/notes-python

asynkron/protoactor-go

golang/protobuf

apache/flink

apache/hive

hyperium/hyper

ml-explore/mlx

albumentations-team/albumentations

tigerbeetle/tigerbeetle

apache/iceberg