Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
DataX is a distributed data integration framework and plugin-based ETL tool designed for synchronizing large datasets between heterogeneous sources and destinations. It functions as a JDBC data migration engine and offline synchronization tool, enabling the movement of data between relational databases, NoSQL stores, and object storage. The system utilizes a plugin-based connector architecture that decouples reader and writer logic, allowing it to map and transform data types across different storage engines using a standardized internal representation. This design supports heterogeneous data
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Chunjun is a distributed data integration framework and SQL-based ETL pipeline designed to synchronize data between heterogeneous sources. It functions as a change data capture tool and a heterogeneous data synchronizer, utilizing a distributed processing environment to move and transform data across different database types.
The main features of dtstack/chunjun are: Distributed Data Processing Frameworks, Heterogeneous Data Synchronization, Change Data Capture, Change Data Capture Tools, Checkpoints and Recovery, Distributed Cluster Execution, Incremental Data Synchronization, SQL-Based Pipeline Definitions.
Open-source alternatives to dtstack/chunjun include: hazelcast/hazelcast — Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to… apache/flink-cdc — This project is a streaming data integration framework that captures real-time database changes and synchronizes them… alibaba/datax — DataX is a distributed data integration framework and plugin-based ETL tool designed for synchronizing large datasets… risingwavelabs/risingwave — RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process… dlt-hub/dlt — dlt is a Python data ingestion tool and ETL pipeline framework designed to fetch data from diverse sources and persist… jerrylead/sparkinternals — SparkInternals is a technical reference and architecture guide detailing the internal design and implementation of the…