30 open-source projects similar to debezium/debezium, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Debezium alternative.
Connect is a Kafka data integration platform and stream processing engine used to build declarative pipelines that move and transform messages between Kafka topics and external sources. It functions as a Kafka Connect framework and a change data capture tool, streaming real-time database modifications to synchronize data across distributed environments. The project differentiates itself through a dedicated mapping language for mutating and reshaping message payloads and the ability to execute custom processing logic within a sandboxed WebAssembly runtime. It also provides an observability pip
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
Canal is a database replication middleware that performs change data capture by simulating a database replica. It monitors transaction logs to stream incremental data modifications to downstream systems in real time, acting as an event streaming infrastructure that transforms low-level binary logs into structured, consumable message streams. The project distinguishes itself through a high-throughput architecture that utilizes concurrent multi-threaded parsing and stateful log position tracking to ensure reliable data delivery. It employs a pluggable sink architecture that decouples data extra
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
Realtime is a real-time data distribution and synchronization engine that enables applications to stream database changes and coordinate state between clients. It functions as a synchronization layer that monitors database write-ahead logs to provide change data capture and pushes updates to authorized clients via WebSockets. The project features a real-time presence server for tracking the online status of active users and a broadcast service for sending ephemeral messages without database persistence. It organizes communication through channel-based message routing and uses a structured JSO
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Plotly.py is a comprehensive framework for building production-ready data applications and interactive dashboards directly from Python code. It functions as both a high-performance visualization library for browser-based charts and a full-stack tool for transforming analytical scripts into responsive, web-based interfaces. By abstracting away the need for manual HTML or JavaScript, it allows developers to define complex layouts and functional logic using modular, reusable components. The framework distinguishes itself through a robust architecture that handles event orchestration and state sy
Audited is a Ruby on Rails audit log library and change data capture framework. It tracks model changes by recording previous and current attribute values during create, update, and destroy operations to maintain a complete history of database modifications. The system functions as a database versioning tool and user activity tracker. It allows for the retrieval of historical record states by timestamp or index, enables reverting models to previous versions, and associates record modifications with specific user identities and remote IP addresses. The library includes capabilities for sensit
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
Airbyte is a data integration platform designed to synchronize information between diverse applications, databases, and data warehouses. It functions as an extract, transform, and load orchestrator that manages automated data movement workflows across cloud, on-premise, and hybrid environments. The platform provides a standardized interface for connectors, enabling the movement of structured and unstructured data while maintaining stateful checkpoints for reliable incremental syncing. The platform distinguishes itself through a containerized architecture that isolates connectors to prevent de
Benthos is a declarative stream processor and data integration pipeline used to route, transform, and filter information between disparate services. It functions as an at-least-once message broker and change data capture engine, using a transaction model to guarantee message delivery despite system crashes or server faults. The system is defined by an observability-first approach, featuring built-in HTTP health probes, performance metrics export, and distributed request flow tracing. It utilizes a plugin architecture that allows the core engine to be extended with custom binaries for new inpu
Benthos is a stream processing engine and data integration pipeline used for routing, transforming, and connecting data streams between diverse sources and sinks. It functions as event routing middleware and a change data capture tool, streaming real-time database modifications as discrete events for downstream processing. The system utilizes a declarative pipeline configuration, where data flow and processing logic are defined in a single static file. It features a specialized domain-specific language for mapping, filtering, and enriching data payloads, allowing for complex transformations w
SuperduperDB is an AI agent orchestrator and database-integrated machine learning platform. It serves as a framework for building stateful AI agents and retrieval-augmented generation applications by integrating large language models directly with database backends. The project enables the deployment of self-hosted AI infrastructure and the management of language models on private hardware using local checkpoints. It distinguishes itself by allowing users to attach AI components directly to data fields, triggering model execution and automated transformations based on database insertions and
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
Readyset is a transparent caching proxy for PostgreSQL and MySQL that sits between an application and its database, intercepting SQL queries and serving cached results from memory. It automatically caches query results on first execution and keeps those caches consistent by consuming the database’s replication stream in real time, enabling faster repeated reads without application code changes. The proxy also supports caching advanced SQL functions such as window functions, bucket functions, and locale-aware collation sorting, and exposes an interface that allows AI agents to inspect proxied q
Materialize is a streaming SQL database that continuously ingests live data from sources such as Kafka, Redpanda, PostgreSQL, and MySQL, and incrementally maintains materialized views. It provides a PostgreSQL-compatible query engine that accepts standard SQL over the PostgreSQL wire protocol, enabling any existing SQL client or BI tool to query real-time data. The system also includes a Model Context Protocol (MCP) server that exposes live materialized view data to AI agents, providing fresh context without polling. Materialize distinguishes itself through its ability to offer configurable c
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
This project provides an integrated backend platform built around a relational database. It automatically generates REST and GraphQL APIs from database schemas, allowing for direct data interaction through standard requests and client libraries. The platform includes a comprehensive authentication system that manages user identity, session handling, and fine-grained access control through database-native row-level security policies. Beyond core data management, the platform offers specialized services for object storage, vector data processing for semantic search, and real-time communication
Dolt is a relational database engine that integrates version control directly into the database management layer. It functions as a version-controlled SQL database that tracks every row and schema change using a commit-based history, allowing users to branch, merge, and audit data modifications. By implementing a wire-protocol-compatible server, the system enables standard SQL clients and tools to interact with versioned data as if they were connecting to a traditional relational database. The platform distinguishes itself by applying repository-style workflows to data management, including s
rqlite is a distributed relational database that replicates SQLite data across a cluster using the Raft consensus algorithm. It functions as a fault-tolerant storage system that provides high availability and a web API for executing SQL queries and managing relational data without requiring native database drivers. The system distinguishes itself by using an HTTP SQL interface to expose database operations and cluster management. It features a real-time change data capture stream that pushes database mutations to external HTTP endpoints via webhooks and supports the scaling of read throughput
The MongoDB Node.js Driver is a programmatic interface and NoSQL database client used to manage document storage and execute operations within a MongoDB database. It serves as an asynchronous database interface and connection manager that enables Node.js applications to integrate with MongoDB servers. The project implements client-side field encryption to secure sensitive data and queries locally before transmission. It also provides a BSON serialization library to convert JavaScript objects into a binary format for efficient storage and network transmission. The driver covers a broad range
SlateDB is a cloud-native key-value store and distributed database engine that utilizes a log-structured merge-tree architecture. It serves as a transactional storage layer designed to persist data directly to cloud object storage. The engine differentiates itself by optimizing read performance for remote storage through the use of bloom filters and multi-level block caching. It employs a single-writer multi-reader model and provides the ability to create zero-copy clones via copy-on-write checkpointing. The system supports atomic transactions, range queries, and snapshot-based concurrency c
ShardingSphere is a distributed SQL database middleware that provides sharding, read-write splitting, and distributed transaction management for relational databases. It functions as a layer that intercepts SQL queries to distribute data across multiple physical database instances for horizontal scaling. The project is distinguished by its ability to operate as either a standalone transparent database proxy or via direct integration as a JDBC driver. It features a SQL dialect translator that parses queries into abstract syntax trees to convert syntax between different database engines, enabli
RedisInsight is a graphical user interface and management tool for browsing, analyzing, and administering Redis databases. It provides a visual environment for exploring key-value data structures, managing database instances, and performing data analysis across different operating systems and deployments. The tool distinguishes itself by providing dedicated visual managers for complex operations, including a vector database manager for configuring embeddings and similarity searches, a query workbench for executing raw commands and Lua scripts, and a performance monitoring dashboard for tracki
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
Vitess is a database clustering system for horizontal scaling of MySQL. It functions as a middleware layer that abstracts complex sharding and physical topology, allowing applications to interact with a distributed database environment through a unified interface. By intercepting and routing SQL queries across multiple shards, it enables large-scale data management while maintaining the appearance of a single database instance. The platform distinguishes itself through its ability to perform online schema migrations and distributed transaction coordination without requiring application downti
TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions. The platform distinguishes itself through its hybrid transactional and analytical proc