What are the best open-source alternatives to Beam?

30 open-source projects similar to apache/beam, ranked by shared features. Top picks: apache/flink, hazelcast/hazelcast, apache/spark, risingwavelabs/risingwave, aws/aws-cdk, vonng/ddia, apache/pinot, apache/hadoop, infinyon/fluvio, delta-io/delta.

Is apache/flink a good alternative to Beam?

Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transform…

Is hazelcast/hazelcast a good alternative to Beam?

Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency acc…

Is apache/spark a good alternative to Beam?

Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the executio…

Is risingwavelabs/risingwave a good alternative to Beam?

RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open…

Is aws/aws-cdk a good alternative to Beam?

The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for…

Is vonng/ddia a good alternative to Beam?

This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent i…

Is apache/pinot a good alternative to Beam?

Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system arch…

Is apache/hadoop a good alternative to Beam?

Hadoop is a big data infrastructure suite and distributed data processing framework designed to store and process massive datasets across clusters of computers. It consists of a distributed storage system for managing large files across multiple nodes and a parallel computing engine for processing…

Is infinyon/fluvio a good alternative to Beam?

Fluvio is a distributed event streaming platform and cloud-native streaming engine designed for collecting, persisting, and replicating real-time data streams across a distributed cluster. It functions as a real-time data pipeline for building stateful workflows that ingest, enrich, and export data…

Is delta-io/delta a good alternative to Beam?

Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compu…

Back to apache/beam

Open-source alternatives to Beam

30 open-source projects similar to apache/beam, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Beam alternative.

apache/flink
apache/flink
26,086View on GitHub
Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transformations. The system is distinguished by its ability to manage persistent operator state to ensure exactly-once processing guarantees and consistency during failures. It features specialized capabilities for complex event processing to detect temporal patterns and handles out-of-order events using eve
Java
View on GitHub26,086
hazelcast/hazelcast
hazelcast/hazelcast
6,570View on GitHub
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Javabig-datacachingdata-in-motion
View on GitHub6,570
apache/spark
apache/spark
43,467View on GitHub
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Scalabig-datajavajdbc
View on GitHub43,467

Open-source alternatives to Beam

apache/flink

hazelcast/hazelcast

apache/spark

risingwavelabs/risingwave

aws/aws-cdk

Vonng/ddia

apache/pinot

apache/hadoop

infinyon/fluvio

delta-io/delta

spotify/luigi

flyteorg/flyte

boto/boto3

Unstructured-IO/unstructured

ray-project/ray

dagster-io/dagster

apache/kafka

nathanmarz/storm

JerryLead/SparkInternals

oxnr/awesome-bigdata

databricks/Spark-The-Definitive-Guide

PaddlePaddle/Serving

apache/ignite

DTStack/chunjun

Eventual-Inc/Daft

donnemartin/data-science-ipython-notebooks

go-task/task

apache/hive

JuliaPluto/Pluto.jl

zhisheng17/flink-learning