41 Repos
High-availability storage systems designed for synchronization across multiple locations or providers.
Distinguishing note: Focuses on the storage architecture for high availability, distinct from general data replication.
Explore 41 awesome GitHub repositories matching data & databases · Distributed Storage. Refine with filters or upvote what's useful.
Dieses Projekt ist ein von der Community kuratiertes Verzeichnis von Open-Source-Software, die für den Einsatz in privaten Serverumgebungen und Home-Labs konzipiert ist. Es dient als umfassende Ressource zur Entdeckung unabhängiger, selbst gehosteter Alternativen zu gängigen Cloud-Diensten und ermöglicht es Nutzern, die volle Datenhoheit und Kontrolle über ihre digitale Infrastruktur zu behalten. Das Verzeichnis ist durch eine hierarchische Taxonomie strukturiert, die eine riesige Sammlung von Anwendungen in logische Kategorien organisiert, von Medienmanagement und Datenanalyse bis hin zu privater Kommunikation und Tools für die Teamproduktivität. Es zeichnet sich durch einen kollaborativen Peer-Review-Prozess aus, bei dem Community-Mitglieder die Qualität und Relevanz jeder Einreichung validieren, um sicherzustellen, dass das Verzeichnis korrekt und zuverlässig bleibt. Das Projekt deckt ein breites Spektrum an Fähigkeiten ab, einschließlich Infrastruktur-Automatisierung, containerbasierter Service-Bereitstellung und deklarativem Konfigurationsmanagement. Diese Tools unterstützen Nutzer bei der Aufrechterhaltung reproduzierbarer Serverumgebungen und der Verwaltung komplexer Service-Abhängigkeiten auf privater Hardware. Das Verzeichnis wird als versionskontrolliertes Repository gepflegt, wodurch sichergestellt wird, dass alle Updates und Community-gesteuerten Änderungen nachverfolgt und transparent sind.
Replicates data across multiple physical locations to ensure high availability and resilience for object storage services.
Ente is a privacy-focused platform for end-to-end encrypted storage and two-factor authentication management. It functions as a zero-knowledge identity provider, ensuring that all cryptographic operations, key derivation, and data encryption occur locally on the user's device. By maintaining this architecture, the service provider remains unable to access or decrypt any stored personal information or authentication credentials. The platform distinguishes itself through a combination of on-device intelligence and resilient data distribution. It utilizes a local machine learning engine to perfo
Synchronizes encrypted information across multiple geographical regions and providers to guarantee permanent accessibility.
Jaeger is a distributed tracing platform used for collecting, storing, and visualizing request flows across microservices. It identifies performance bottlenecks and errors by tracking requests as they move through multiple service boundaries. The system includes telemetry collectors, a multi-tenant backend, and a trace visualizer. The platform provides a multi-tenant tracing infrastructure that isolates data and queries by tenant to support shared environments. It supports standardized telemetry ingestion via the OpenTelemetry Protocol over gRPC and HTTP. To manage storage costs and overhead,
Persists trace data to distributed database backends for scalable long-term storage and retrieval.
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
Distributes data across multiple nodes and replicates it to ensure high availability and persistent storage for large-scale datasets.
Luigi is a Python framework designed for building and managing complex batch data pipelines. It functions as a workflow orchestration engine that organizes tasks into directed acyclic graphs, ensuring that jobs execute in the correct logical order based on their dependencies. By utilizing a centralized scheduler, the system coordinates task execution across distributed environments, tracks global workflow state, and prevents redundant processing by verifying the existence of output targets before triggering any work. The project distinguishes itself through a robust state-tracking mechanism t
Integrates with distributed storage systems to read and write files within automated batch processing tasks.
KubeSphere is a distributed operating system for cloud-native application management that provides a centralized control plane for Kubernetes clusters. It functions as a comprehensive DevOps portal, enabling teams to orchestrate containerized workloads, manage CI/CD pipelines, and enforce security policies across hybrid cloud, datacenter, and edge environments. The platform distinguishes itself through its multi-cluster federation capabilities and robust multi-tenancy model, which allow for logical resource isolation and granular access control across shared infrastructure. It integrates a mo
Manages distributed storage layers to optimize data access and performance for containerized workloads.
Scylla is a distributed wide column NoSQL database designed as a high-performance data store. It functions as a Cassandra compatible database and a DynamoDB compatible store, implementing a shared-nothing architecture built on an asynchronous event-driven framework. The system emulates cloud-based APIs to support applications built for proprietary cloud protocols and implements the Cassandra Query Language for high-throughput workloads. This allows for the migration of cloud workloads to self-hosted environments while maintaining API compatibility. The project covers distributed data storage
Implements a distributed storage architecture ensuring high availability and fault tolerance across multiple nodes.
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
Supports mounting shared network file systems across nodes to ensure consistent access to large-scale datasets.
OneDev is a self-hosted, unified development platform that integrates Git repository hosting, issue tracking, and continuous integration and deployment (CI/CD) into a single system. It provides a comprehensive environment for managing the entire software lifecycle, allowing teams to coordinate code reviews, track development tasks, and automate build pipelines through a centralized interface. The platform distinguishes itself by offering browser-based, containerized development environments that allow developers to access and edit project files directly on the server. Its build system utilize
Scales and organizes data storage across multiple nodes to support large-scale project requirements.
Rook is a Kubernetes storage orchestrator and distributed storage operator that automates the deployment and management of storage clusters. It serves as a multi-protocol storage provider, offering block, file, and object storage capabilities to containerized workloads. The system focuses on providing a self-healing storage cluster that replicates data across hardware nodes to maintain availability and recover from failures. It uses an operator-led model to handle the installation, scaling, and upgrades of storage nodes and daemons. The orchestrator covers a broad range of provisioning servi
Implements a distributed storage operator to manage the lifecycle, scaling, and upgrades of block, file, and object storage.
The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane. The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It
Distributes storage resources across multiple accounts within an organization to facilitate collaborative data access.
This project serves as a comprehensive educational roadmap and technical resource collection for developers building decentralized finance applications. It provides a structured curriculum that guides users through the entire lifecycle of blockchain development, from mastering smart contract architecture and security best practices to integrating decentralized infrastructure into modern web applications. The repository distinguishes itself by offering a holistic view of the decentralized ecosystem, bridging the gap between low-level protocol interaction and high-level application design. It c
Offloads large data assets to decentralized file systems to ensure content persistence and availability.
FastDFS is a distributed file system and object store designed as a high-capacity file server. It functions as a cluster storage manager that saves, syncs, and accesses large volumes of unstructured data across a network of distributed servers. The system uses unique identifiers for file retrieval and indexing instead of traditional hierarchical naming to avoid metadata bottlenecks. It manages file attributes through key-value metadata mapping and employs a distributed replication model to ensure high availability and data redundancy across storage groups. The project provides capabilities f
Saves and manages large volumes of unstructured data across a network of distributed servers for increased capacity.
This project is a GitOps infrastructure framework designed for managing bare metal servers, container clusters, and networking. It serves as a declarative system for orchestrating the deployment and lifecycle of self-hosted services, using Git as the source of truth to synchronize the desired state of the environment. The framework differentiates itself through a comprehensive automation suite that covers the entire hardware-to-service pipeline. It includes a PXE-based bare metal provisioner for network booting and operating system installation, alongside a lightweight container orchestration
Implements a redundant, distributed block storage layer to provide persistent data access for containerized applications.
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
Distributes data securely across multiple cloud providers and organizational accounts using standardized protocols.
This project is a comprehensive educational resource and curriculum focused on site reliability engineering, distributed systems, and infrastructure operations. It provides technical guides, a systems engineering course, and instructional manuals designed to teach the principles of managing large-scale computing environments. The curriculum covers high-level architectural design for scalability and resilience, including fault-tolerant infrastructure, high-availability patterns, and microservices decomposition. It emphasizes the practical application of site reliability engineering through the
Instruction on managing high-throughput access to large datasets using fault-tolerant distributed storage systems.
ntopng ist ein webbasiertes Tool zur Überwachung des Netzwerkverkehrs und zur Aggregation von Flow-Daten. Es fungiert als Netzwerksicherheitsmonitor, SNMP-Netzwerkmanagementsystem und industrieller Protokollanalysator für OT- und SCADA-Umgebungen. Das System bietet eine spezialisierte Inspektion für industrielle Protokolle wie Modbus, DNP3 und IEC 60870. Es zeichnet sich durch verhaltensbasierte Bedrohungserkennung, Analyse verschlüsselten Datenverkehrs mittels Handshake-Fingerprinting sowie die Fähigkeit aus, Hardware und Betriebssysteme anhand von DHCP- und MAC-Adressmustern zu identifizieren. Zu den weiteren Funktionen gehören Echtzeit-Verkehrsanalyse und Paketaufzeichnung, Netzwerk-Topologie-Mapping und die Orchestrierung hierarchischer Sammler-Strukturen. Die Plattform verwaltet zudem die Netzwerkzugangskontrolle über Captive Portals, erzwingt Verkehrskontingente und exportiert Flow- und Alarmdaten in externe Datenbanken wie ClickHouse, Elasticsearch und Kafka. Das Projekt unterstützt die Ausführung mehrerer unabhängiger Überwachungsinstanzen auf einem einzigen Host unter Verwendung isolierter Konfigurationen.
Distributes network monitoring data across database nodes to ensure high availability and scalable storage.
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Allows sharing a single storage implementation across multiple data structures using factory patterns.
Microsandbox is a runtime for creating and managing lightweight, hardware-isolated virtual machines — called sandboxes — that boot directly from standard OCI container images. Each sandbox runs as its own host process with a separate kernel, filesystem, and network stack, providing process-per-sandbox isolation. The project includes a command-line tool and multi-language SDKs (Rust, TypeScript, Python, Go) for programmatic lifecycle control, and it communicates with sandbox agents over Unix sockets using a CBOR-encoded protocol. What distinguishes Microsandbox is its combination of host-manag
Mounts directory-backed volumes or disk images across sandboxes for shared filesystem access.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Provides a SQL interface to query aggregated event data, enabling unified views across distributed microservice architectures.