# redpanda-data/connect

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/redpanda-data-connect).**

8,681 stars · 944 forks · Go

## Links

- GitHub: https://github.com/redpanda-data/connect
- Homepage: https://docs.redpanda.com/connect/home/
- awesome-repositories: https://awesome-repositories.com/repository/redpanda-data-connect.md

## Topics

`amqp` `cqrs` `data-engineering` `data-ops` `etl` `event-sourcing` `go` `golang` `kafka` `logs` `message-bus` `message-queue` `nats` `rabbitmq` `stream-processing` `stream-processor` `streaming-data`

## Description

Connect is a Kafka data integration platform and stream processing engine used to build declarative pipelines that move and transform messages between Kafka topics and external sources. It functions as a Kafka Connect framework and a change data capture tool, streaming real-time database modifications to synchronize data across distributed environments.

The project differentiates itself through a dedicated mapping language for mutating and reshaping message payloads and the ability to execute custom processing logic within a sandboxed WebAssembly runtime. It also provides an observability pipeline that exports metrics and execution traces using the OpenTelemetry standard.

The system covers a broad range of integration capabilities, including cloud data warehousing for services like BigQuery and Iceberg, as well as SQL data management and cloud storage integration. It supports advanced data operations such as Grok text processing, schema registry integration, and broker message routing for distributing data to multiple outputs.

Configuration is managed through structured files, with available utilities for configuration schema validation and natural language pipeline generation.

## Tags

### Data & Databases

- [Real-Time Data Integration Platforms](https://awesome-repositories.com/f/data-databases/real-time-data-integration-platforms.md) — Provides a platform for synchronizing live data across heterogeneous storage and analytical environments using declarative pipelines.
- [Change Data Capture](https://awesome-repositories.com/f/data-databases/change-data-capture.md) — Provides real-time streaming of database modifications by polling transaction logs or using change-tracking mechanisms.
- [Change Data Capture Tools](https://awesome-repositories.com/f/data-databases/change-data-capture-tools.md) — Monitors and streams database modifications to Kafka topics and other external systems in real-time.
- [Kafka Connectors](https://awesome-repositories.com/f/data-databases/data-ingestion/kafka-connectors.md) — Provides a framework for building declarative data pipelines that move and transform messages between Kafka and external sources.
- [Stream Management](https://awesome-repositories.com/f/data-databases/data-ingestion/kafka-connectors/stream-management.md) — Manages the flow of messages through Kafka topics with built-in support for batching and consumer group control.
- [Data Integration Pipelines](https://awesome-repositories.com/f/data-databases/data-integration-pipelines.md) — Provides systems that orchestrate the movement and routing of data streams between diverse sources and sinks. ([source](https://github.com/redpanda-data/connect#readme))
- [Data Pipeline Configurations](https://awesome-repositories.com/f/data-databases/data-pipeline-configurations.md) — Uses structured files to declaratively define the topology and data flow between sources and sinks. ([source](https://github.com/redpanda-data/connect/blob/main/README.md))
- [Stream Processing Engines](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks/stream-processing-engines.md) — Operates as a system that performs continuous computation, transformation, and enrichment on real-time data streams.
- [Data Transformation Languages](https://awesome-repositories.com/f/data-databases/data-transformation-languages.md) — Provides Bloblang, a dedicated declarative language for mutating, filtering, and reshaping message payloads. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))
- [Kafka Client Integrations](https://awesome-repositories.com/f/data-databases/kafka-client-integrations.md) — Integrates with Apache Kafka clusters as a producer and consumer with batching and consumer group control. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))
- [Kafka Connect Frameworks](https://awesome-repositories.com/f/data-databases/kafka-connect-frameworks.md) — Functions as a distributed system for connecting Kafka clusters to databases and cloud storage via source and sink connectors.
- [Message Routing](https://awesome-repositories.com/f/data-databases/message-brokers/message-routing.md) — Provides advanced routing capabilities to distribute data streams to multiple targets using fan-out, sequential, or fallback patterns. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))
- [Stream Transformations](https://awesome-repositories.com/f/data-databases/stream-transformations.md) — Implements real-time processing of data streams to apply mutations and reshaping using a dedicated mapping language. ([source](https://github.com/redpanda-data/connect#readme))
- [Data Enrichment](https://awesome-repositories.com/f/data-databases/data-enrichment.md) — Enriches real-time data streams by sending payloads to external HTTP endpoints and integrating the responses. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))
- [Iceberg Catalog Exporters](https://awesome-repositories.com/f/data-databases/data-export/iceberg-catalog-exporters.md) — Provides utilities for transferring processed data into Apache Iceberg-based catalogs with automatic schema evolution. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))
- [Data Warehousing](https://awesome-repositories.com/f/data-databases/data-warehousing.md) — Syncs streaming data to large-scale analytics warehouses and table catalogs for high-performance analytical queries.
- [BigQuery Connectors](https://awesome-repositories.com/f/data-databases/database-connectivity/bigquery-connectors.md) — Ships specialized modules for synchronizing data to BigQuery analytics warehouses with support for upsert modes. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))
- [Cloud Object Storage](https://awesome-repositories.com/f/data-databases/file-storage-systems/cloud-object-storage.md) — Supports reading and writing unstructured data as blobs across various cloud storage providers. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))
- [Schema Registries](https://awesome-repositories.com/f/data-databases/schema-registries.md) — Encodes and decodes messages by fetching versioned schema definitions from a remote schema registry. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))
- [SQL Data Retrieval](https://awesome-repositories.com/f/data-databases/sql-data-retrieval.md) — Extracts, filters, and aggregates data from relational tables using standard SQL query language. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))
- [Transactional Delivery Guarantees](https://awesome-repositories.com/f/data-databases/transactional-delivery-guarantees.md) — Ensures message reliability and prevents data loss using a local transaction model for managing offsets and state.

### Programming Languages & Runtimes

- [Data Mapping Languages](https://awesome-repositories.com/f/programming-languages-runtimes/pipeline-domain-specific-languages/data-mapping-languages.md) — Uses a dedicated domain-specific language to filter, mutate, and reshape streaming data payloads.
- [Embedded Wasm Runtimes](https://awesome-repositories.com/f/programming-languages-runtimes/runtime-execution-environments/webassembly/embedded-wasm-runtimes.md) — Runs custom processing logic within the data pipeline using an embedded WebAssembly runtime. ([source](https://github.com/redpanda-data/connect/blob/main/CHANGELOG.md))

### Networking & Communication

- [At-Least-Once Delivery Guarantees](https://awesome-repositories.com/f/networking-communication/message-delivery-guarantees/at-least-once-delivery-guarantees.md) — Ensures every message is delivered at least once using an in-process transaction model to prevent data loss. ([source](https://github.com/redpanda-data/connect#readme))

### Software Engineering & Architecture

- [Plugin Architectures](https://awesome-repositories.com/f/software-engineering-architecture/integration-extensibility/extensibility/plugin-architectures.md) — Implements a plugin architecture allowing new source and sink capabilities to be added as compiled modules.
- [Plugin Architectures](https://awesome-repositories.com/f/software-engineering-architecture/plugin-architectures.md) — Provides an extensibility framework for implementing custom source and sink capabilities using compiled plugins. ([source](https://github.com/redpanda-data/connect#readme))
- [Schema Registries](https://awesome-repositories.com/f/software-engineering-architecture/schema-registries.md) — Integrates with remote schema registries to decode and encode message payloads using versioned definitions.
- [Wasm-Based Plugins](https://awesome-repositories.com/f/software-engineering-architecture/software-architecture/architectural-patterns/plugin-module-systems/modular-plugin-architectures/plugin-based-architectures/wasm-based-plugins.md) — Executes custom processing logic via WebAssembly modules for safe, portable, and sandboxed pipeline extensions.

### System Administration & Monitoring

- [End-to-End Message Tracing](https://awesome-repositories.com/f/system-administration-monitoring/end-to-end-message-tracing.md) — Implements end-to-end tracing to visualize the complete flow of messages as they move through the data pipeline. ([source](https://github.com/redpanda-data/connect/blob/main/README.md))
- [Pipeline Health Monitors](https://awesome-repositories.com/f/system-administration-monitoring/health-monitoring/pipeline-health-monitors.md) — Includes specialized monitors to track the execution status, timing, and operational health of data pipelines. ([source](https://github.com/redpanda-data/connect#readme))
- [OpenTelemetry Exporters](https://awesome-repositories.com/f/system-administration-monitoring/opentelemetry-exporters.md) — Exports pipeline telemetry and execution traces using standard OpenTelemetry protocols for end-to-end visibility.
- [Performance Metrics Exporters](https://awesome-repositories.com/f/system-administration-monitoring/performance-metrics-exporters.md) — Exposes pipeline performance telemetry in standard formats for integration with external monitoring backends. ([source](https://github.com/redpanda-data/connect/blob/main/README.md))
