This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks.
The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processing guarantees to prevent duplicate records.
Broad capabilities include distributed data synchronization, multi-sink routing, and in-flight data transformation. The framework provides tools for filtering records, generating computed columns, and performing non-blocking incremental snapshotting to capture historical state without locking tables.
Applications can be packaged into deployable archives containing the necessary connectors for distributed execution.