# apache/flink-cdc

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/apache-flink-cdc).**

6,430 stars · 2,170 forks · Java · Apache-2.0

## Links

- GitHub: https://github.com/apache/flink-cdc
- Homepage: https://nightlies.apache.org/flink/flink-cdc-docs-stable
- awesome-repositories: https://awesome-repositories.com/repository/apache-flink-cdc.md

## Topics

`batch` `cdc` `change-data-capture` `data-integration` `data-pipeline` `distributed` `elt` `etl` `flink` `kafka` `mysql` `paimon` `postgresql` `real-time` `schema-evolution`

## Description

This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks.

The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processing guarantees to prevent duplicate records.

Broad capabilities include distributed data synchronization, multi-sink routing, and in-flight data transformation. The framework provides tools for filtering records, generating computed columns, and performing non-blocking incremental snapshotting to capture historical state without locking tables.

Applications can be packaged into deployable archives containing the necessary connectors for distributed execution.

## Tags

### Data & Databases

- [Change Data Capture](https://awesome-repositories.com/f/data-databases/change-data-capture.md) — Captures real-time database modifications and streams them as events to synchronize state with external systems. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/connectors/flink-sources/overview/))
- [Distributed Stream Processors](https://awesome-repositories.com/f/data-databases/plugin-based-etl-frameworks/distributed-stream-processors.md) — Implements a distributed streaming ETL framework for filtering, transforming, and routing data in flight.
- [Data Integration Pipelines](https://awesome-repositories.com/f/data-databases/data-integration-pipelines.md) — Provides a pipeline system for orchestrating the movement and routing of data streams between database sources and target sinks.
- [Exactly-Once Processing Semantics](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks/exactly-once-processing-semantics.md) — Ensures that historical data and change events are processed exactly once, even during job failures. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/))
- [Schema Evolution](https://awesome-repositories.com/f/data-databases/data-type-schemas/schema-evolution.md) — Detects structural modifications in source tables and automatically applies those changes to the target system. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/))
- [Database Synchronization Tools](https://awesome-repositories.com/f/data-databases/database-synchronization-tools.md) — Synchronizes database changes in real time with automated schema evolution and exactly-once delivery guarantees.
- [Distributed Data Synchronization Systems](https://awesome-repositories.com/f/data-databases/distributed-data-synchronization-systems.md) — Moves data from source databases to target systems in real-time or batch mode using a distributed engine. ([source](https://cdn.jsdelivr.net/gh/apache/flink-cdc@master/README.md))
- [Snapshot-to-Log Transitions](https://awesome-repositories.com/f/data-databases/in-memory-caches/persistence-managers/snapshotting-and-command-logging/snapshot-to-log-transitions.md) — Reads an initial database snapshot and transitions to change logs to ensure consistency after failures. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/connectors/flink-sources/overview/))
- [Real-Time Data Streaming](https://awesome-repositories.com/f/data-databases/real-time-data-streaming.md) — Allows the construction of custom streaming applications that process and deliver data in real-time. ([source](https://cdn.jsdelivr.net/gh/apache/flink-cdc@master/README.md))
- [Real-time Data Synchronization](https://awesome-repositories.com/f/data-databases/real-time-data-synchronization.md) — Moves data from source databases to target systems in real time to keep downstream environments updated.
- [Automated Schema Propagation](https://awesome-repositories.com/f/data-databases/schema-synchronizers/schema-propagation-protocols/automated-schema-propagation.md) — Detects structural changes in source databases and automatically applies modifications to downstream target schemas.
- [Snapshot Synchronization](https://awesome-repositories.com/f/data-databases/snapshot-synchronization.md) — Captures historical data using snapshots and transitions to real-time capture to bootstrap synchronization. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/))
- [Apache Flink Connectors](https://awesome-repositories.com/f/data-databases/streaming-source-and-sink-integration/apache-flink-connectors.md) — A streaming data integration framework that leverages Apache Flink connectors to synchronize database changes.
- [Full Instance Synchronization](https://awesome-repositories.com/f/data-databases/change-data-capture/full-instance-synchronization.md) — Synchronizes all tables from a source database instance to downstream systems within a single job. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/))
- [Cross-Database Data Migrations](https://awesome-repositories.com/f/data-databases/cross-database-data-migrations.md) — Moves entire database instances to data lakes or analytical warehouses using snapshots and change logs.
- [Computed Columns](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-modeling-schemas/data-schemas/column-definitions/computed-columns.md) — Generates new data columns based on existing fields or metadata using evaluation expressions. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/core-concept/transform/))
- [In-Flight Column Projection](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-modeling-schemas/data-schemas/column-definitions/computed-columns/in-flight-column-projection.md) — Transforms data in flight by applying evaluation expressions to filter records and generate computed columns.
- [Custom Connector Development](https://awesome-repositories.com/f/data-databases/data-i-o/custom-connector-development.md) — Provides interfaces for creating custom source and sink adapters to integrate external systems into data pipelines. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/connectors/pipeline-connectors/overview/))
- [Multi-Sink Routing](https://awesome-repositories.com/f/data-databases/data-sinking/multi-sink-routing.md) — Maps specific source tables to designated sink tables to organize data distribution across multiple target systems. ([source](https://cdn.jsdelivr.net/gh/apache/flink-cdc@master/README.md))
- [Database Layout Extraction](https://awesome-repositories.com/f/data-databases/data-sources/source-metadata-retrievers/database-layout-extraction.md) — Retrieves namespaces, schemas, and table structures from external systems to identify the current database layout. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/developer-guide/understand-flink-cdc-api/))
- [Data Transformation Functions](https://awesome-repositories.com/f/data-databases/data-transformation-functions.md) — Removes unnecessary records and modifies data columns using arithmetic, string, and logical functions during synchronization. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/core-concept/transform/))
- [Sink Data Loading](https://awesome-repositories.com/f/data-databases/sink-data-loading.md) — Loads processed data into sink targets such as search engines, data lakes, and analytical databases. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/connectors/pipeline-connectors/overview/))
- [Source-to-Sink Table Mappings](https://awesome-repositories.com/f/data-databases/source-to-sink-table-mappings.md) — Defines rules to match source tables to destination tables using one-to-one or pattern-based renaming. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/core-concept/route/))
- [SQL-Based CDC Integrations](https://awesome-repositories.com/f/data-databases/sql-based-cdc-integrations.md) — Defines change data capture sources using SQL statements to query and process database changes. ([source](https://cdn.jsdelivr.net/gh/apache/flink-cdc@master/README.md))
- [Table Update Monitoring](https://awesome-repositories.com/f/data-databases/table-update-monitoring.md) — Tracks changes to specific database tables to trigger synchronization events. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/connectors/flink-sources/overview/))
- [Streaming Connector Abstractions](https://awesome-repositories.com/f/data-databases/unified-storage-interfaces/unified-data-connector-interfaces/streaming-connector-abstractions.md) — Decouples source and sink implementations from the engine using standardized streaming connector interfaces.
- [User-Defined Functions](https://awesome-repositories.com/f/data-databases/user-defined-functions.md) — Integrates custom logic classes to perform specialized data transformations via programmable evaluation methods. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/core-concept/transform/))

### Part of an Awesome List

- [Streaming ETL Pipelines](https://awesome-repositories.com/f/awesome-lists/data/data-processing-and-etl/streaming-etl-pipelines.md) — Provides tools for filtering, transforming, and enriching data in flight as it moves between a source database and a sink.

### DevOps & Infrastructure

- [Exactly-Once Processing Guarantees](https://awesome-repositories.com/f/devops-infrastructure/fault-tolerance/exactly-once-processing-guarantees.md) — Guarantees that data is written to the target system exactly once to prevent duplicate records. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/get-started/introduction/))

### Software Engineering & Architecture

- [Incremental Snapshotting](https://awesome-repositories.com/f/software-engineering-architecture/architectural-design-patterns/state-management/persistence-and-serialization/state-serialization/state-snapshots/database-snapshots/incremental-snapshotting.md) — Reads historical database state while simultaneously capturing real-time change logs without locking tables.
- [Pipeline Orchestration](https://awesome-repositories.com/f/software-engineering-architecture/application-lifecycle-management/configuration-management/configuration-formats-and-schemas/yaml-configuration-files/pipeline-orchestration.md) — Generates distributed execution operators by translating declarative YAML configuration files into operational streaming jobs.
- [Declarative Pipeline Definitions](https://awesome-repositories.com/f/software-engineering-architecture/dataflow-frameworks/live-dataflow-graph-modifiers/declarative-pipeline-definitions.md) — Defines sources, sinks, and routing rules using structured configuration languages like YAML to deploy jobs. ([source](https://cdn.jsdelivr.net/gh/apache/flink-cdc@master/README.md))
- [Two-Phase Commit Protocols](https://awesome-repositories.com/f/software-engineering-architecture/distributed-transaction-coordinators/two-phase-commit-protocols.md) — Ensures exactly-once delivery by coordinating transaction commits between the streaming engine and destination systems.

### Development Tools & Productivity

- [Event Deserialization](https://awesome-repositories.com/f/development-tools-productivity/change-tracking/row-level-change-logs/change-data-capture/event-deserialization.md) — Converts database change events into JSON format with optional schema metadata to optimize processing performance. ([source](https://nightlies.apache.org/flink/flink-cdc-docs-stable/docs/connectors/flink-sources/overview/))

### Networking & Communication

- [Stream-to-Sink Routing](https://awesome-repositories.com/f/networking-communication/messaging-api-integrations/topic-message-listeners/stream-to-sink-routing.md) — Routes multi-table data streams to designated sinks using pattern-based renaming and routing rules.
