30 open-source projects similar to pingcap/tikv, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Tikv alternative.
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
OceanBase is a distributed SQL database designed for high availability and strong consistency across multiple nodes and regions. It functions as a hybrid transactional and analytical processing engine, allowing real-time analytics and transactions to execute on a single data copy. The system also serves as a vector database engine for indexing and querying vector data to power semantic search and recommendation systems. The platform features native compatibility layers for MySQL and Oracle, enabling the migration of legacy workloads without rewriting SQL code. It utilizes a Paxos-based distri
Nebula is a distributed graph database designed for storing and querying massive volumes of interconnected vertices and edges across a horizontally scalable cluster. It functions as a Kubernetes-native database and a distributed graph analytics engine, utilizing a Raft-based distributed store to ensure strong consistency and high availability. The system features an OpenCypher query engine for performing complex graph traversals and pattern matching. It distinguishes itself with a decoupled compute-storage architecture and a shared-nothing distributed design, allowing query processing and dat
YugabyteDB is a distributed SQL database and relational data store designed for horizontal scalability and high availability across multiple nodes or regions. It functions as a cloud-native system that ensures continuous availability and supports PostgreSQL compatible query languages and drivers. The system includes specialized capabilities as a vector database for AI, utilizing high-dimensional indexing to perform similarity searches. It is engineered as a multi-region cloud database that synchronizes data across different geographic locations to maintain global availability. The project co
pgdog is a PostgreSQL sharding proxy, distributed SQL router, and connection pooler. It is designed to enable horizontal data distribution by splitting tables and indices across multiple independent servers to scale storage and processing capacity. The project distinguishes itself through online resharding capabilities, using logical replication to move data between shards without application downtime. It supports multiple routing strategies, including hash, list, and range-based query routing, and manages distributed atomic transactions using a two-phase commit process to ensure consistency
TiKV is a distributed transactional key-value store designed for horizontal scalability and high availability. It functions as a storage engine that maintains massive datasets across a cluster of physical nodes, ensuring that information remains accessible and consistent even when individual hardware components fail. The system utilizes a consensus-based replication model to synchronize data across nodes, ensuring that all replicas agree on the order of operations. It manages data distribution through a sharding mechanism that partitions large datasets into smaller groups, each governed by in
FoundationDB is an ACID-compliant distributed transactional key-value store. It functions as a scalable database engine that ensures strict serializability and data consistency across a cluster of servers using a shared-nothing architecture. The system is distinguished by its multi-region replication capabilities, allowing data to be synchronized across different datacenters for high availability and disaster recovery. It utilizes optimistic concurrency control to manage distributed transactions and employs a majority-based coordination system to maintain cluster state. The platform provides
etcd is a distributed key-value store and configuration store designed to maintain a consistent set of data across a cluster of nodes. It functions as a reliable registry for storing and synchronizing critical settings and metadata used by distributed applications. The system implements the Raft consensus algorithm to ensure data consistency and leader election across servers. To protect data transfers and verify node identities, it utilizes a network security layer based on mutual TLS and client certificates. Its capabilities cover distributed configuration management, cluster state synchro
Weaviate is a cloud-native vector database and distributed vector store designed to save high-dimensional vectors alongside structured data. It functions as a hybrid search engine that combines vector similarity, keyword matching, and structured metadata filtering within a single query. The system is optimized for retrieval-augmented generation, integrating vector search with generative AI and reranking to power question-and-answer workflows. It distinguishes itself through the ability to merge semantic search with traditional keyword queries and structured metadata filters to improve result
YDB is a distributed SQL database and analytical engine designed for horizontal scalability and strong consistency. It functions as a multi-model system that supports transactional and analytical workloads through a distributed architecture providing serializable ACID transactions. The system is distinguished by its broad protocol compatibility, implementing the PostgreSQL wire protocol for standard SQL drivers and the Kafka protocol for messaging and streaming. It further serves as a vector database, supporting vector indexes and approximate nearest neighbor searches for semantic search and
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Garnet is a multi-threaded in-memory database and distributed key-value store. It functions as a high-performance remote cache store that implements the RESP wire protocol to maintain compatibility with existing Redis clients and libraries. The project is distinguished by a shared-memory architecture that enables parallel request processing across multiple cores for sub-millisecond latency. It features a tiered storage system that automatically offloads colder data from system memory to SSD or cloud storage layers, and includes a specialized vector search database for high-dimensional similar
Olric is a distributed data grid and in-memory key-value store that partitions and replicates data across a cluster of servers. It serves as a shared memory system for managing distributed maps, performing atomic operations, and acting as an in-memory data cache. The system provides a distributed locking mechanism for concurrency control and a pub-sub messaging system that broadcasts and routes messages over named channels across the cluster. The platform covers wide-ranging capabilities including cluster management and orchestration, data replication with configurable quorums, and automated
This is a Raft consensus library and distributed consensus engine implemented in Go. It provides the primitives necessary to build fault-tolerant distributed services by implementing a replicated state machine that ensures a group of servers agree on a shared system state through leader election and log replication. The project distinguishes itself through a pluggable architecture for storage backends and snapshot storage, decoupling the consensus logic from physical persistence. It includes specialized mechanisms for leadership transfer, protocol version management to support rolling upgrade
rqlite is a distributed relational database that replicates SQLite data across a cluster using the Raft consensus algorithm. It functions as a fault-tolerant storage system that provides high availability and a web API for executing SQL queries and managing relational data without requiring native database drivers. The system distinguishes itself by using an HTTP SQL interface to expose database operations and cluster management. It features a real-time change data capture stream that pushes database mutations to external HTTP endpoints via webhooks and supports the scaling of read throughput
Dragonboat is a Go implementation of the Raft consensus protocol designed to maintain consistent state across a distributed cluster of nodes. It provides a library for building distributed state machines that ensure data integrity and fault tolerance during system failures. The project distinguishes itself through a multi-group Raft implementation, which partitions data across independent consensus groups to distribute workloads and increase overall system processing capacity. It also incorporates mutual TLS to encrypt inter-node communication and verify the identity of cluster members. The
Jocko is a cloud-native event streaming platform and distributed commit log implemented in Go. It functions as a distributed message broker that ensures data durability and high availability by replicating record sequences across a cluster. The system is designed as a Zookeeperless event streamer, utilizing built-in consensus coordination to manage cluster state and leader election without requiring external coordinator services. It implements the Kafka wire protocol, allowing it to communicate with existing ecosystem clients and tools. The platform provides capabilities for distributed log
ToyDB is a distributed SQL database that provides a system for storing and querying data across multiple nodes. It focuses on maintaining strong consistency and fault tolerance through the implementation of a distributed consensus algorithm. The project distinguishes itself by supporting historical data versioning, enabling time-travel queries to retrieve the state of the database from a specific point in the past. It utilizes multi-version concurrency control to manage ACID transactions and ensure data integrity during concurrent operations. The system covers relational data modeling with t
Seata is a distributed transaction coordinator designed to ensure data consistency and atomicity across microservices. It provides a centralized framework for managing global transactions, preventing partial data updates across different databases and services. The project implements multiple transaction modes to balance consistency and performance. This includes an automatic mode that uses rollback logs to coordinate compensation without modifying business logic, a try-confirm-cancel pattern for resources lacking native ACID support, and a saga orchestration engine for managing long-lived bu
Cassandra is a distributed NoSQL database and wide-column store designed for high availability and linear scalability. It functions as a fault-tolerant distributed system that utilizes an LSM-tree storage engine to optimize write throughput and manage massive datasets. The system is a CQL-compliant database, using a structured query language to manage and retrieve tabular data stored across multiple nodes. It organizes information into rows and columns based on a flexible schema and primary keys. The project provides capabilities for horizontal database scaling, distributed data partitioning
RocketMQ is a cloud-native distributed messaging platform and streaming engine. It functions as a distributed transactional queue that ensures atomicity between local transactions and message delivery, and serves as an MQTT IoT message broker to bridge lightweight device traffic into high-performance data streams. The system is distinguished by a Kubernetes-native architecture that decouples compute from storage to allow independent scaling of traffic and data retention. It utilizes a tiered storage model to offload older data to remote storage and employs quorum-based replication and automat
brpc is a high-performance C++ RPC framework and network programming library designed for building distributed systems. It functions as a multi-protocol RPC server capable of hosting and detecting multiple communication protocols, including gRPC, Thrift, HTTP, Redis, and Memcached, on a single TCP port. The project distinguishes itself through high-throughput data transport and memory efficiency, utilizing RDMA-based transport to bypass the kernel TCP stack and zero-copy memory management to eliminate data duplication. It also implements the Raft algorithm for consensus-based state replicatio
dtm is a distributed transaction framework and workflow orchestrator designed to manage data consistency across microservices. It functions as a centralized coordinator that synchronizes distributed transactions and manages the state and execution flow of complex sequences of distributed tasks. The framework implements several coordination patterns, including Saga, TCC, and XA, to prevent synchronization errors. It specifically provides a multi-language transaction coordinator to synchronize operations across services written in different programming languages and utilizes a reliable messagin
Faust is a Python library for building distributed stream processing applications that integrate with Kafka. It functions as an asynchronous stream processor designed to handle high-throughput event streams and real-time data analysis using asynchronous functions. The system operates as a distributed stream processor and state store, utilizing sharding and partitioned topics to scale processing workloads horizontally across multiple worker nodes. It maintains state through a replicated key-value storage system backed by local databases to ensure high availability and fast recovery. The frame
Pebble is an embedded key-value storage engine written in Go, designed as a library that provides durable, write-optimized data persistence directly within applications. It organizes data using a log-structured merge-tree (LSM-tree) structure, where writes are first buffered in an in-memory skiplist memtable and persisted to a write-ahead log before being flushed to block-based SSTable files on disk. The engine supports atomic batch commits, configurable write synchronization, and automatic background compaction that merges and rewrites sorted runs to reclaim space and maintain read performanc
Vitess is a database clustering system for horizontal scaling of MySQL. It functions as a middleware layer that abstracts complex sharding and physical topology, allowing applications to interact with a distributed database environment through a unified interface. By intercepting and routing SQL queries across multiple shards, it enables large-scale data management while maintaining the appearance of a single database instance. The platform distinguishes itself through its ability to perform online schema migrations and distributed transaction coordination without requiring application downti
Apache Pulsar is a cloud-native distributed pub-sub messaging system designed for high-performance data ingestion. It functions as a geo-replicated data streamer and a multi-tenant event streaming platform, providing a serverless stream processing engine and a tiered storage messaging broker. The system distinguishes itself by separating serving layers from storage layers to allow independent scaling of compute and data retention. It features native geo-replication to synchronize messages across different geographical regions and employs a multi-layered tenant isolation model using authentica
This project provides educational materials and courseware focused on the theoretical and practical foundations of distributed systems design. It serves as a comprehensive curriculum covering the disciplines of consensus, data consistency, reliability engineering, and scalability. The instructional content focuses on achieving cluster agreement through consensus algorithms and managing system-wide state via coordination frameworks. It includes a dedicated guide to data theory, exploring replication strategies, consistency models, and data convergence. The courseware covers a broad capability
GreptimeDB is a distributed, open-source time-series database built for unified observability. It stores and queries metrics, logs, and traces together in a single columnar engine, supporting both SQL and PromQL for analysis. The database is designed as a Kubernetes-native operator with a decoupled compute and storage architecture, enabling horizontal scaling and multi-region deployment. What distinguishes GreptimeDB is its role as a multi-protocol ingestion gateway, accepting data through OpenTelemetry, Prometheus Remote Write, InfluxDB, Loki, Elasticsearch, Kafka, and MQTT protocols without
Consul is a distributed coordination service and service mesh tool used for service discovery, health monitoring, and cluster state management across dynamic networks. It provides a platform for locating network addresses of services and managing traffic across distributed infrastructure using DNS and HTTP interfaces. The project distinguishes itself through multi-datacenter network orchestration, enabling the federation of services across different regions using mesh gateways. It secures communication via a service mesh architecture that employs identity-based authorization and mutual TLS en