30 Repos
Architectural patterns for partitioning large datasets across multiple database nodes.
Distinguishing note: Focuses on horizontal partitioning rather than general database management.
Explore 30 awesome GitHub repositories matching data & databases · Database Sharding. Refine with filters or upvote what's useful.
Developer Roadmap ist eine Community-gesteuerte Plattform, die strukturierte, graphbasierte Lernpfade für das Software-Engineering bietet. Sie dient als umfassendes Wissens-Repository, in dem technische Bereiche in visuellen Sequenzen organisiert sind, um den Erwerb beruflicher Fähigkeiten und das Karrierewachstum zu steuern. Das Projekt zeichnet sich durch ein kollaboratives Ökosystem aus, das es Nutzern ermöglicht, Roadmaps beizusteuern, bewährte Branchenpraktiken zu kuratieren und berufliche Profile zu pflegen. Es integriert diagnostische Bewertungs-Frameworks, um die technische Kompetenz zu evaluieren, und hilft Entwicklern dabei, Wissenslücken zu identifizieren und sich durch gezielte Lernsequenzen auf professionelle Vorstellungsgespräche vorzubereiten. Über seine Kern-Mapping-Funktionen hinaus bietet die Plattform praktische Projektideen und interaktives Tutoring, um Engineering-Konzepte zu festigen. Sie bietet einen zentralen Raum für die Community, um Ressourcen zu teilen, den fortschreitenden Kompetenzaufbau zu verfolgen und durch komplexe technische Landschaften zu navigieren.
Shards database data to support large-scale roadmap storage.
This project is a comprehensive Java backend engineering guide and technical reference focused on high-concurrency design, distributed systems, and microservices architecture. It provides detailed strategies for decomposing monolithic applications, managing service discovery, and implementing the architectural patterns required for scalable backend environments. The repository distinguishes itself through an extensive collection of big data algorithmic references and database scaling strategies. It covers memory-efficient techniques for analyzing massive datasets, such as Top-K element extrac
Implements architectural patterns for partitioning large datasets across multiple database nodes to handle high volumes.
This project is a comprehensive reference collection of practical implementation examples and patterns for building applications with Spring Boot. It serves as a Java web application template and a showcase for developing functional web services featuring REST endpoints, template engines, and global exception handling. The repository distinguishes itself by providing detailed demonstrations of enterprise-grade features, including distributed locking, task scheduling, and asynchronous message exchange using brokers like RabbitMQ. It also includes reference implementations for automated API doc
Distributes data across multiple databases and tables to improve horizontal scalability and performance.
InfluxDB is a specialized time series database platform engineered for the high-speed ingestion, compression, and retrieval of timestamped data at scale. It functions as a distributed metrics platform, providing the infrastructure necessary to organize and analyze massive volumes of time-stamped information to identify trends, patterns, and anomalies within complex data streams. The platform distinguishes itself through a functional dataflow engine that utilizes a specialized programming language for complex analytical transformations and automated tasks. This architecture is supported by a p
Distributes data across multiple physical storage segments to enable horizontal scaling and parallel query execution.
This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments. The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
Partitions data across multiple machines based on shard keys to increase storage capacity and throughput for large datasets.
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
RethinkDB partitions data across a cluster using a range-based sharding algorithm that automatically distributes documents based on their primary key values to ensure balanced storage.
Vitess is a distributed MySQL orchestrator and clustering system designed for horizontal database scaling. It functions as sharding middleware that distributes data and load across multiple MySQL instances to handle growth beyond the capacity of a single machine. The system provides a proxy layer that abstracts data distribution, allowing applications to query a cluster as a single logical database without knowing the physical location of the data. This is achieved through a routing mechanism that intercepts queries and directs them to the appropriate shards based on keyspace mappings. The p
Manages the splitting and merging of data across multiple MySQL instances to optimize performance.
ShardingSphere is a distributed SQL database middleware that provides sharding, read-write splitting, and distributed transaction management for relational databases. It functions as a layer that intercepts SQL queries to distribute data across multiple physical database instances for horizontal scaling. The project is distinguished by its ability to operate as either a standalone transparent database proxy or via direct integration as a JDBC driver. It features a SQL dialect translator that parses queries into abstract syntax trees to convert syntax between different database engines, enabli
Implements architectural patterns for partitioning large datasets across multiple database nodes for horizontal scaling.
Vitess is a database clustering system for horizontal scaling of MySQL. It functions as a middleware layer that abstracts complex sharding and physical topology, allowing applications to interact with a distributed database environment through a unified interface. By intercepting and routing SQL queries across multiple shards, it enables large-scale data management while maintaining the appearance of a single database instance. The platform distinguishes itself through its ability to perform online schema migrations and distributed transaction coordination without requiring application downti
Distributes data across multiple nodes using keyspace definitions and routing rules to enable horizontal scaling of large-scale database workloads.
Reddit is a social news aggregator designed for hosting community-driven discussions and content sharing through threaded conversations and user-submitted links. It functions as a platform for managing large volumes of user-generated content, providing a structured interface for programmatic access to site data and core application functionality. The platform utilizes a REST API to expose site data and user interactions to external clients. To maintain performance across large datasets, it employs an external full-text search engine that offloads indexing and query processing from the primary
Partitions massive datasets across multiple physical storage nodes to ensure horizontal scalability.
Citus is a PostgreSQL extension that transforms a standard database into a distributed system. It functions as a sharding framework and distributed SQL engine, enabling horizontal scaling by partitioning tables across a cluster of nodes. By utilizing a coordinator-worker topology, the system manages metadata and routes queries to the appropriate nodes, allowing for parallel execution of complex operations across distributed data shards. The platform distinguishes itself through its specialized support for multi-tenant architectures and real-time analytical processing. It enables tenant-based
Distributed database systems split tables into shards based on a distribution column to maximize hardware efficiency and tenant density in multi-tenant database environments.
Electric is a Postgres data synchronization engine and replication proxy designed to enable local-first software. It replicates data from Postgres databases to client-side stores in real time using logical replication, allowing applications to maintain a local embedded database for offline access and low-latency updates. The system distinguishes itself by using shapes to filter and authorize specific subsets of database rows and columns before streaming them to clients or edge workers. It further supports multi-user collaboration by integrating a conflict-free replicated data type framework t
Directs client connections to specific database shards using client-side logic or edge routing to distribute load.
Mycat-Server is a MySQL database middleware system that functions as a sharding proxy, distributed database coordinator, and high availability manager. It acts as a proxy layer that routes SQL traffic between applications and multiple backend MySQL database instances to enable horizontal scaling. The system coordinates distributed transactions, generates global unique sequences to prevent primary key collisions, and executes distributed join queries across multiple database shards. It includes a load balancer that performs read-write splitting by directing traffic between primary and slave no
Provides a sharding architecture that distributes relational data across multiple physical nodes for horizontal scaling.
This project is a comprehensive knowledge base and study resource designed for mastering technical interviews. It provides structured guides, roadmaps, and curricula focused on data structures, algorithms, system design, and frontend engineering to help candidates prepare for software engineering screenings. The repository distinguishes itself by offering a holistic approach to professional advancement. Beyond technical drills, it includes a career development handbook covering resume optimization, salary benchmarking, and strategic negotiation coaching. It also provides detailed methodologie
Includes study materials on horizontal partitioning patterns to improve database scalability.
Soar is a suite of specialized tools designed for analyzing MySQL performance, advising on indexing, and optimizing SQL syntax. It functions as a performance analyzer, index advisor, and query optimizer to identify bottlenecks and suggest structural improvements for faster execution. The project distinguishes itself through a system for rewriting SQL statements into optimized equivalent versions using custom heuristic rules and patterns. It also features a dedicated index advisor that evaluates query patterns and database metadata to recommend the creation of new indexes. Its broader capabil
Transforms SQL queries using custom rewrite rules to modify statement structures for better execution efficiency.
This project is a comprehensive educational resource and curriculum focused on site reliability engineering, distributed systems, and infrastructure operations. It provides technical guides, a systems engineering course, and instructional manuals designed to teach the principles of managing large-scale computing environments. The curriculum covers high-level architectural design for scalability and resilience, including fault-tolerant infrastructure, high-availability patterns, and microservices decomposition. It emphasizes the practical application of site reliability engineering through the
Provides educational material on architectural patterns for partitioning large datasets across multiple database nodes.
Kingshard is a MySQL database proxy and sharding middleware that routes SQL traffic between clients and multiple database nodes. It functions as a load balancer, read-write splitter, and SQL query firewall to manage how data is accessed and distributed across a database infrastructure. The system implements data sharding using hash, range, or date strategies to split tables across multiple nodes. It enables read-write splitting by directing data modification requests to a master node while distributing read-only queries across a pool of slave replicas. The proxy provides traffic management t
Provides the ability to override automatic sharding logic by specifying the target node via SQL comments.
OmniRoute is a unified LLM API gateway that connects multiple AI providers to a single endpoint. Its primary purpose is to simplify the integration of various AI models into tools and agents by translating different provider formats into a standardized API. The project distinguishes itself through a multi-strategy request routing system that optimizes for cost, speed, and availability, including automatic model fallbacks and a circuit-breaker resilience model to isolate provider failures. It employs a local-first security posture, using AES-256-GCM encryption to store API keys and conversatio
Tests how requests flow through different strategy combinations without sending actual data to a provider.
Dies ist ein Backend-as-a-Service-SDK, das Web- und mobile Anwendungen mit einer Suite von Cloud-Diensten verbindet. Es bietet ein einheitliches Interface für das Management von Benutzeridentitäten, die Ausführung serverloser Logik und die Handhabung von Cloud-Objektspeichern. Das Toolkit zeichnet sich durch seine Echtzeit-Datensynchronisation aus, die es ermöglicht, NoSQL-Dokumentendaten über mehrere Clients hinweg konsistent zu halten, inklusive integrierter Offline-Persistenz. Es erleichtert den sicheren Benutzerzugriff über eine Vielzahl von Identitätsanbietern und verwaltet den Aufruf serverloser Funktionen, um Backend-Logik als Reaktion auf HTTPS-Anfragen oder Datenbankereignisse auszuführen. Das SDK deckt ein breites Spektrum an operativen Fähigkeiten ab, einschließlich NoSQL- und relationalem Datenbankmanagement, Crash-Monitoring und Analysen des Benutzerverhaltens. Es bietet zudem Tools für die Remote-Anwendungskonfiguration, gezielte Push-Benachrichtigungen und die Integration von Large Language Models für KI-gestützte Funktionen. Das Projekt ist in TypeScript implementiert und bietet sprachspezifische Bibliotheken, die REST- und WebSocket-APIs in hochgradige Methoden abstrahieren.
Supports partitioning large datasets across multiple database instances to handle increased load.
Sequel is a relational database toolkit for Ruby that provides object-relational mapping, a fluent SQL query builder, and schema migration capabilities. It maps database tables to Ruby classes with support for associations, validations, lifecycle hooks, and eager loading, offering a comprehensive ORM layer for building data-centric applications. Sequel distinguishes itself through a plugin-based extension architecture that allows composable customization of models, databases, and datasets without relying on deep inheritance hierarchies. It includes a thread-safe connection pool with support f
Ensures model instances retrieved from or created on a shard are saved to the same shard.