# Specialized and distributed databases

> Search results for `Specialized and distributed databases` on awesome-repositories.com. 116 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/specialized-and-distributed-databases

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/specialized-and-distributed-databases).**

## Results

- [distribution/distribution](https://awesome-repositories.com/repository/distribution-distribution.md) (10,479 ⭐) — Distribution is an open-source container image registry that implements the OCI Distribution Specification, enabling any OCI-compatible client to push, pull, and manage container images over standard protocols. It serves as a content distribution toolkit for packaging, shipping, storing, and delivering container content across networked environments, storing and retrieving content by its cryptographic hash for integrity and deduplication.

The registry separates image metadata from bulk data to enable efficient validation and partial pulls, and supports resumable blob uploads with chunked tran
- [cockroachdb/cockroach](https://awesome-repositories.com/repository/cockroachdb-cockroach.md) (32,207 ⭐) — Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures.

The system distinguishes itself through
- [b4rtaz/distributed-llama](https://awesome-repositories.com/repository/b4rtaz-distributed-llama.md) (2,837 ⭐) — Distributed-llama is a distributed inference engine and command line tool for running large language models across multiple networked machines. It functions as a compute cluster manager that coordinates worker nodes to share the computational load of a single model.

The system utilizes tensor parallelism to shard model weights across different hosts, allowing the execution of models that exceed the memory capacity of a single piece of hardware. It includes a dedicated format converter to transform standard model files into a compatible binary layout optimized for distributed loading.

The eng
- [dask/dask](https://awesome-repositories.com/repository/dask-dask.md) (13,746 ⭐) — Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements.

The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
- [brendangregg/flamegraph](https://awesome-repositories.com/repository/brendangregg-flamegraph.md) (19,307 ⭐) — FlameGraph is a performance profiling and visualization toolkit designed to identify bottlenecks in software execution. It functions as a processing engine that transforms raw stack trace samples into interactive, hierarchical diagrams. By representing aggregated execution frequency as nested rectangles, the tool allows developers to visualize hot code paths and analyze system behavior across both kernel and user-space environments.

The project distinguishes itself through its ability to perform differential profile analysis, which highlights performance regressions or improvements by compari
- [sciruby/distribution](https://awesome-repositories.com/repository/sciruby-distribution.md) (51 ⭐) — Probability distributions for Ruby.
- [greptimeteam/greptimedb](https://awesome-repositories.com/repository/greptimeteam-greptimedb.md) (5,968 ⭐) — GreptimeDB is a distributed, open-source time-series database built for unified observability. It stores and queries metrics, logs, and traces together in a single columnar engine, supporting both SQL and PromQL for analysis. The database is designed as a Kubernetes-native operator with a decoupled compute and storage architecture, enabling horizontal scaling and multi-region deployment.

What distinguishes GreptimeDB is its role as a multi-protocol ingestion gateway, accepting data through OpenTelemetry, Prometheus Remote Write, InfluxDB, Loki, Elasticsearch, Kafka, and MQTT protocols without
- [apple/foundationdb](https://awesome-repositories.com/repository/apple-foundationdb.md) (16,446 ⭐) — FoundationDB is an ACID-compliant distributed transactional key-value store. It functions as a scalable database engine that ensures strict serializability and data consistency across a cluster of servers using a shared-nothing architecture.

The system is distinguished by its multi-region replication capabilities, allowing data to be synchronized across different datacenters for high availability and disaster recovery. It utilizes optimistic concurrency control to manage distributed transactions and employs a majority-based coordination system to maintain cluster state.

The platform provides
- [olric-data/olric](https://awesome-repositories.com/repository/olric-data-olric.md) (3,469 ⭐) — Olric is a distributed data grid and in-memory key-value store that partitions and replicates data across a cluster of servers. It serves as a shared memory system for managing distributed maps, performing atomic operations, and acting as an in-memory data cache.

The system provides a distributed locking mechanism for concurrency control and a pub-sub messaging system that broadcasts and routes messages over named channels across the cluster.

The platform covers wide-ranging capabilities including cluster management and orchestration, data replication with configurable quorums, and automated
- [docker/distribution](https://awesome-repositories.com/repository/docker-distribution.md) (10,474 ⭐) — This project is a container image registry and server-side storage system designed to house container images, layers, and manifests. It functions as an OCI compliant registry server that adheres to the Open Container Initiative Distribution Specification to store and deliver content over HTTP.

The system provides a self-hosted solution for managing private libraries of container images within professional-grade infrastructure. It is designed to enable the development of custom registries by extending a base toolkit with specialized libraries and business logic.

The registry covers image dist
- [bytebytegohq/system-design-101](https://awesome-repositories.com/repository/bytebytegohq-system-design-101.md) (83,491 ⭐) — This project is a centralized engineering knowledge repository that provides a structured curriculum for mastering system design, architectural patterns, and fundamental software development workflows. It serves as a professional development resource for engineers, offering foundational knowledge and real-world case studies to support the design of scalable, secure, and efficient distributed systems.

The repository distinguishes itself through a visual-first approach to knowledge synthesis, distilling complex technical concepts into high-density graphical diagrams and succinct illustrations.
- [dask/distributed](https://awesome-repositories.com/repository/dask-distributed.md) (1,671 ⭐) — A distributed task scheduler for Dask
- [apache/doris](https://awesome-repositories.com/repository/apache-doris.md) (15,526 ⭐) — Doris is a distributed SQL data warehouse designed for high-performance analytical workloads and real-time data processing. It functions as a unified platform that integrates traditional relational warehousing with lakehouse query capabilities, allowing users to execute analytical operations directly against external data lakes without requiring data migration.

The system distinguishes itself through a shared-nothing, massively parallel processing architecture that utilizes vectorized query execution and columnar storage to maintain sub-second latency. It supports dynamic schema evolution, en
- [drizzle-team/drizzle-orm](https://awesome-repositories.com/repository/drizzle-team-drizzle-orm.md) (34,835 ⭐) — Drizzle ORM is a TypeScript-native database toolkit providing type-safe SQL query building, schema management, and automated migrations across PostgreSQL, MySQL, SQLite, and SingleStore.
- [encode/databases](https://awesome-repositories.com/repository/encode-databases.md) (4,002 ⭐) — Async database support for Python. 🗄
- [vesoft-inc/nebula](https://awesome-repositories.com/repository/vesoft-inc-nebula.md) (12,239 ⭐) — Nebula is a distributed graph database designed for storing and querying massive volumes of interconnected vertices and edges across a horizontally scalable cluster. It functions as a Kubernetes-native database and a distributed graph analytics engine, utilizing a Raft-based distributed store to ensure strong consistency and high availability.

The system features an OpenCypher query engine for performing complex graph traversals and pattern matching. It distinguishes itself with a decoupled compute-storage architecture and a shared-nothing distributed design, allowing query processing and dat
- [google/sentencepiece](https://awesome-repositories.com/repository/google-sentencepiece.md) (11,657 ⭐) — SentencePiece is a text segmentation engine and tokenization library designed for machine learning workflows. It provides a comprehensive toolkit for transforming raw text into subword units or numerical identifiers, enabling consistent data representation for neural network training and inference. The library supports the training of segmentation models from raw text, allowing for the creation of custom vocabularies tailored to specific domain requirements.

The project distinguishes itself through its byte-level encoding and fallback mechanisms, which ensure that every input can be represent
- [prestodb/presto](https://awesome-repositories.com/repository/prestodb-presto.md) (16,711 ⭐) — Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface.

The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
- [nodesource/distributions](https://awesome-repositories.com/repository/nodesource-distributions.md) (13,834 ⭐) — This project is a Node.js binary distribution repository and Linux package repository. It provides a hosted set of pre-compiled JavaScript runtime binaries for various Linux distributions to simplify installation and version management through native package managers.

The project includes a Node.js observability toolset and security policy manager. These components enable the gathering of runtime telemetry to monitor application health and performance via diagnostic dashboards, while providing a resource restriction layer that intercepts system calls to prevent unauthorized modules from acces
- [apache/datafusion](https://awesome-repositories.com/repository/apache-datafusion.md) (8,908 ⭐) — Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules.

The engine distinguishes itself through its modular extension framework, which enables building custom query e
- [illuminate/database](https://awesome-repositories.com/repository/illuminate-database.md) (2,766 ⭐) — [READ ONLY] Subtree split of the Illuminate Database component (see laravel/framework)
- [gam-team/gam](https://awesome-repositories.com/repository/gam-team-gam.md) (4,206 ⭐) — GAM is a command-line tool for administering Google Workspace and Cloud Identity. It translates command-line arguments into structured API calls, enabling administrators to manage users, groups, organizational units, and domain settings across a Google Workspace environment. The tool handles authentication through OAuth2 flows, service accounts, and workload identity federation, and supports multi-tenant configurations for managing multiple domains or cloud projects from a single installation.

GAM distinguishes itself through its batch processing and automation capabilities. It can process la
- [madd86/awesome-system-design](https://awesome-repositories.com/repository/madd86-awesome-system-design.md) (11,695 ⭐) — This project is a comprehensive learning resource and reference guide for software architecture and distributed systems design. It serves as a structured curriculum for engineers to study fundamental architectural patterns, scalability strategies, and distributed computing theory, specifically tailored to prepare for technical interviews and professional engineering roles.

The repository distinguishes itself by providing a curated collection of industry-standard infrastructure tools and methodologies. It covers the selection and implementation of technologies for data storage, message brokeri
- [juliastats/distributions.jl](https://awesome-repositories.com/repository/juliastats-distributions-jl.md) (1,193 ⭐) — A Julia package for probability distributions and associated functions.
- [susom/database](https://awesome-repositories.com/repository/susom-database.md) (43 ⭐) — The point of this project is to provide a simplified way of accessing databases. It is a wrapper around the JDBC driver, and tries to hide some of the more error-prone, unsafe, and non-portable parts of the standard API. It uses standard Java types for all operations (as opposed to java.sql.*),…
- [goldendict/goldendict](https://awesome-repositories.com/repository/goldendict-goldendict.md) (6,616 ⭐) — GoldenDict is an offline and online dictionary reader that retrieves word definitions from local files and web sources, displaying entries with full formatting, images, and hyperlinks through an embedded WebKit rendering engine. It reads dictionary files in Babylon, StarDict, Dictd, and ABBYY Lingvo formats without requiring conversion, and supports querying arbitrary websites via user-defined URL templates.

The application integrates with the system through global hotkeys and clipboard monitoring, allowing users to trigger lookups or translate selected text in any other application without m
- [citusdata/citus](https://awesome-repositories.com/repository/citusdata-citus.md) (12,562 ⭐) — Citus is a PostgreSQL extension that transforms a standard database into a distributed system. It functions as a sharding framework and distributed SQL engine, enabling horizontal scaling by partitioning tables across a cluster of nodes. By utilizing a coordinator-worker topology, the system manages metadata and routes queries to the appropriate nodes, allowing for parallel execution of complex operations across distributed data shards.

The platform distinguishes itself through its specialized support for multi-tenant architectures and real-time analytical processing. It enables tenant-based
- [verizonconnect/database-development](https://awesome-repositories.com/repository/verizonconnect-database-development.md) (4 ⭐) — Tooling for deploying, linting and testing relational database code
- [ibax-io/go-ibax](https://awesome-repositories.com/repository/ibax-io-go-ibax.md) (7,858 ⭐) — go-ibax is a blockchain protocol platform and decentralized application infrastructure used to deploy networks with custom governance and token economics. It provides a foundation for building decentralized applications through a framework that integrates identity management and on-chain data storage.

The project features a multilingual virtual machine capable of executing smart contracts written in Go, Rust, and Solidity. It implements a sharded blockchain network to increase throughput and a privacy layer utilizing zero-knowledge proofs and homomorphic encryption to anonymize transaction da
- [fermyon/spin](https://awesome-repositories.com/repository/fermyon-spin.md) (6,443 ⭐) — Spin is a WebAssembly serverless framework and development toolchain for building and running portable microservices. It functions as an event-driven orchestrator and runtime that executes WebAssembly components, allowing developers to map HTTP requests, Redis messages, and cron schedules to specific modules.

The project distinguishes itself by implementing a Wasm-based AI inference gateway, enabling components to perform model inference and generate text embeddings. It utilizes the WebAssembly Component Model and WASI for language-agnostic composition and portable host interfacing, while emp
- [dgraph-io/dgraph](https://awesome-repositories.com/repository/dgraph-io-dgraph.md) (21,700 ⭐) — Dgraph is a distributed graph database designed to store and query highly connected data. It organizes information as nodes and edges to represent complex relationships between entities, providing a platform for managing and analyzing deeply linked datasets.

The system functions as a horizontally scalable cluster that partitions data across multiple nodes to maintain performance and availability as information volume increases. It utilizes a specialized query language built for low-latency navigation of interconnected data points, allowing for the execution of complex queries across large-sca
- [avelino/awesome-go](https://awesome-repositories.com/repository/avelino-awesome-go.md) (175,576 ⭐) — This project serves as a comprehensive language ecosystem index, functioning as a centralized, community-curated directory for the Go programming language. It organizes a vast landscape of software components, libraries, and development tools into a structured, navigable hierarchy, enabling developers to efficiently discover resources tailored to specific functional domains.

The repository distinguishes itself through a decentralized contribution model, where community-driven updates ensure the index remains current with the rapidly evolving software landscape. Beyond simple resource listing,
- [typefacts/alfred-special-characters](https://awesome-repositories.com/repository/typefacts-alfred-special-characters.md) (32 ⭐) — Many special characters needed for good typography are hard to find, some are not even available on the keyboard at all. This workflow gives you access to these characters and pastes them right into your current application.
- [expo/expo](https://awesome-repositories.com/repository/expo-expo.md) (50,111 ⭐) — Expo is a universal mobile framework designed to build native iOS and Android applications from a single codebase using web-standard technologies. It provides a comprehensive development environment that includes a unified runtime for testing, cloud-based infrastructure for compiling and signing native binaries, and automated tools for managing the entire mobile release lifecycle, including app store submission.

The framework distinguishes itself through a plugin-based native configuration engine that programmatically modifies project files, allowing developers to integrate native modules wit
- [jvandevelde/distributed-playground](https://awesome-repositories.com/repository/jvandevelde-distributed-playground.md) (42 ⭐) — Distributed service playground with Vagrant, Consul, Docker & ASP.NET Core
- [rethinkdb/rethinkdb](https://awesome-repositories.com/repository/rethinkdb-rethinkdb.md) (26,996 ⭐) — RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations.

A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
- [huntlabs/hunt-database](https://awesome-repositories.com/repository/huntlabs-hunt-database.md) (49 ⭐) — Database abstraction layer library using pure D programing language, support PostgreSQL and MySQL.
- [appwrite/appwrite](https://awesome-repositories.com/repository/appwrite-appwrite.md) (56,318 ⭐) — Appwrite is a backend-as-a-service platform that provides a unified development environment for building full-stack applications. It integrates essential infrastructure components—including authentication, databases, storage, and serverless functions—into a single, centralized interface to simplify application development and resource management.

The platform distinguishes itself through a container-based microservices architecture that ensures consistent execution across diverse infrastructure. It features a versatile connectivity layer that links frontend applications with third-party servi
- [thanos-io/thanos](https://awesome-repositories.com/repository/thanos-io-thanos.md) (14,121 ⭐) — Thanos is a distributed metrics query engine and monitoring scalability suite designed to provide a unified interface for aggregating data from multiple Prometheus servers and clusters. It functions as a high availability monitoring backend that eliminates single points of failure by deduplicating data from replicated instances.

The system enables long-term retention by persisting time-series data to cloud-native object storage, allowing for unlimited historical archiving beyond the limits of local disks. It further optimizes this storage through a downsampling and retention manager that comp
- [markcox80/specialization-store](https://awesome-repositories.com/repository/markcox80-specialization-store.md) (30 ⭐) — A different type of generic function for common lisp.
- [vitessio/vitess](https://awesome-repositories.com/repository/vitessio-vitess.md) (20,788 ⭐) — Vitess is a database clustering system for horizontal scaling of MySQL. It functions as a middleware layer that abstracts complex sharding and physical topology, allowing applications to interact with a distributed database environment through a unified interface. By intercepting and routing SQL queries across multiple shards, it enables large-scale data management while maintaining the appearance of a single database instance.

The platform distinguishes itself through its ability to perform online schema migrations and distributed transaction coordination without requiring application downti
- [duplicati/duplicati](https://awesome-repositories.com/repository/duplicati-duplicati.md) (14,283 ⭐) — Duplicati is a self-hosted backup server designed to perform encrypted, incremental, and compressed backups to a wide range of local, network, and cloud-based storage providers. It functions as a background service that automates recurring data protection tasks, ensuring that only changed data blocks are stored to maximize efficiency and minimize bandwidth usage.

The project distinguishes itself through a centralized management console that allows for the orchestration of multiple distributed backup agents from a single web-based dashboard. It supports multi-tenant management, enabling the or
- [valkey-io/valkey](https://awesome-repositories.com/repository/valkey-io-valkey.md) (24,875 ⭐) — Valkey is an in-memory, NoSQL database server designed for high-performance data storage and real-time state management. It operates as a distributed key-value store, maintaining datasets entirely within system memory to facilitate sub-millisecond response times for read and write operations.

The system distinguishes itself through a single-threaded event loop that utilizes asynchronous I/O multiplexing to ensure high throughput. It supports high availability via master-replica replication and provides a decoupled communication model through a built-in publish-subscribe messaging pattern. To
- [eto-ai/lance](https://awesome-repositories.com/repository/eto-ai-lance.md) (6,671 ⭐) — Lance is a versioned columnar data format and storage engine designed as a multimodal AI lakehouse. It serves as a vector database storage engine and a cloud object store dataset manager, organizing images, video, audio, and embeddings into a unified format optimized for machine learning workflows.

The project distinguishes itself by combining a columnar layout for structured data with a specialized blob store for large multimodal tensors. It implements a hybrid search engine that integrates vector similarity search, full-text search, and SQL analytics on a single dataset, supported by a stor
- [ivopetiz/crypto-database](https://awesome-repositories.com/repository/ivopetiz-crypto-database.md) (107 ⭐) — Database to store all data from crypto exchanges, currently working with Binance, Bittrex, Cryptopia and Poloniex.
- [fastai/fastai](https://awesome-repositories.com/repository/fastai-fastai.md) (27,862 ⭐) — Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models.

The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
- [orbitdb/orbit-db](https://awesome-repositories.com/repository/orbitdb-orbit-db.md) (8,791 ⭐) — Orbit DB is a decentralized NoSQL database that utilizes conflict-free replicated data types to ensure eventual consistency across a network of nodes. It functions as a peer-to-peer data store that uses IPFS for content-addressing and synchronization, allowing for the maintenance of application state without a central server or authority.

The system is built upon a cryptographically verifiable, immutable operation log, which serves as the foundation for custom decentralized data models. This architecture enables the implementation of various data storage patterns, including JSON document stor
- [kazhuravlev/database-gateway](https://awesome-repositories.com/repository/kazhuravlev-database-gateway.md) (37 ⭐) — Safe access to production databases
- [pingcap/tidb](https://awesome-repositories.com/repository/pingcap-tidb.md) (40,166 ⭐) — TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions.

The platform distinguishes itself through its hybrid transactional and analytical proc
- [firefly-iii/firefly-iii](https://awesome-repositories.com/repository/firefly-iii-firefly-iii.md) (22,431 ⭐) — Firefly III is a self-hosted personal finance management system built on a double-entry bookkeeping engine. It provides a comprehensive platform for tracking income, expenses, and account balances while maintaining financial integrity through structured accounting principles. Designed for private use, the system supports multi-user access, allowing independent financial administrations to coexist within a single installation.

The platform distinguishes itself through extensive automation and integration capabilities. It features a robust REST JSON API and webhook system that enables programma
