41 مستودعات
High-availability storage systems designed for synchronization across multiple locations or providers.
Distinguishing note: Focuses on the storage architecture for high availability, distinct from general data replication.
Explore 41 awesome GitHub repositories matching data & databases · Distributed Storage. Refine with filters or upvote what's useful.
هذا المشروع عبارة عن دليل منسق من قبل المجتمع للبرمجيات مفتوحة المصدر المصممة للنشر في بيئات الخوادم الخاصة والمختبرات المنزلية. يعمل كمورد شامل لاكتشاف بدائل مستقلة ذاتية الاستضافة لخدمات السحابة السائدة، مما يمكن المستخدمين من الحفاظ على ملكية كاملة للبيانات والتحكم في بنيتهم التحتية الرقمية. يتم تنظيم الدليل من خلال تصنيف هرمي ينظم مجموعة واسعة من التطبيقات في فئات منطقية، تتراوح من إدارة الوسائط وتحليل البيانات إلى التواصل الخاص وأدوات إنتاجية الفريق. يتميز بعملية مراجعة أقران تعاونية، حيث يقوم أعضاء المجتمع بالتحقق من جودة وملاءمة كل طلب لضمان بقاء الدليل دقيقاً وموثوقاً. يغطي المشروع نطاقاً واسعاً من القدرات، بما في ذلك أتمتة البنية التحتية، ونشر الخدمات القائمة على الحاويات، وإدارة التكوين التصريحي. تساعد هذه الأدوات المستخدمين في الحفاظ على بيئات خادم قابلة للتكرار وإدارة تبعيات الخدمات المعقدة عبر الأجهزة الخاصة. يتم الحفاظ على الدليل كمستودع خاضع للتحكم في الإصدار، مما يضمن تتبع جميع التحديثات والتغييرات التي يقودها المجتمع وأنها شفافة.
Replicates data across multiple physical locations to ensure high availability and resilience for object storage services.
Ente is a privacy-focused platform for end-to-end encrypted storage and two-factor authentication management. It functions as a zero-knowledge identity provider, ensuring that all cryptographic operations, key derivation, and data encryption occur locally on the user's device. By maintaining this architecture, the service provider remains unable to access or decrypt any stored personal information or authentication credentials. The platform distinguishes itself through a combination of on-device intelligence and resilient data distribution. It utilizes a local machine learning engine to perfo
Synchronizes encrypted information across multiple geographical regions and providers to guarantee permanent accessibility.
Jaeger is a distributed tracing platform used for collecting, storing, and visualizing request flows across microservices. It identifies performance bottlenecks and errors by tracking requests as they move through multiple service boundaries. The system includes telemetry collectors, a multi-tenant backend, and a trace visualizer. The platform provides a multi-tenant tracing infrastructure that isolates data and queries by tenant to support shared environments. It supports standardized telemetry ingestion via the OpenTelemetry Protocol over gRPC and HTTP. To manage storage costs and overhead,
Persists trace data to distributed database backends for scalable long-term storage and retrieval.
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
Distributes data across multiple nodes and replicates it to ensure high availability and persistent storage for large-scale datasets.
Luigi is a Python framework designed for building and managing complex batch data pipelines. It functions as a workflow orchestration engine that organizes tasks into directed acyclic graphs, ensuring that jobs execute in the correct logical order based on their dependencies. By utilizing a centralized scheduler, the system coordinates task execution across distributed environments, tracks global workflow state, and prevents redundant processing by verifying the existence of output targets before triggering any work. The project distinguishes itself through a robust state-tracking mechanism t
Integrates with distributed storage systems to read and write files within automated batch processing tasks.
KubeSphere is a distributed operating system for cloud-native application management that provides a centralized control plane for Kubernetes clusters. It functions as a comprehensive DevOps portal, enabling teams to orchestrate containerized workloads, manage CI/CD pipelines, and enforce security policies across hybrid cloud, datacenter, and edge environments. The platform distinguishes itself through its multi-cluster federation capabilities and robust multi-tenancy model, which allow for logical resource isolation and granular access control across shared infrastructure. It integrates a mo
Manages distributed storage layers to optimize data access and performance for containerized workloads.
Scylla is a distributed wide column NoSQL database designed as a high-performance data store. It functions as a Cassandra compatible database and a DynamoDB compatible store, implementing a shared-nothing architecture built on an asynchronous event-driven framework. The system emulates cloud-based APIs to support applications built for proprietary cloud protocols and implements the Cassandra Query Language for high-throughput workloads. This allows for the migration of cloud workloads to self-hosted environments while maintaining API compatibility. The project covers distributed data storage
Implements a distributed storage architecture ensuring high availability and fault tolerance across multiple nodes.
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
Supports mounting shared network file systems across nodes to ensure consistent access to large-scale datasets.
OneDev is a self-hosted, unified development platform that integrates Git repository hosting, issue tracking, and continuous integration and deployment (CI/CD) into a single system. It provides a comprehensive environment for managing the entire software lifecycle, allowing teams to coordinate code reviews, track development tasks, and automate build pipelines through a centralized interface. The platform distinguishes itself by offering browser-based, containerized development environments that allow developers to access and edit project files directly on the server. Its build system utilize
Scales and organizes data storage across multiple nodes to support large-scale project requirements.
Rook is a Kubernetes storage orchestrator and distributed storage operator that automates the deployment and management of storage clusters. It serves as a multi-protocol storage provider, offering block, file, and object storage capabilities to containerized workloads. The system focuses on providing a self-healing storage cluster that replicates data across hardware nodes to maintain availability and recover from failures. It uses an operator-led model to handle the installation, scaling, and upgrades of storage nodes and daemons. The orchestrator covers a broad range of provisioning servi
Implements a distributed storage operator to manage the lifecycle, scaling, and upgrades of block, file, and object storage.
The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane. The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It
Distributes storage resources across multiple accounts within an organization to facilitate collaborative data access.
This project serves as a comprehensive educational roadmap and technical resource collection for developers building decentralized finance applications. It provides a structured curriculum that guides users through the entire lifecycle of blockchain development, from mastering smart contract architecture and security best practices to integrating decentralized infrastructure into modern web applications. The repository distinguishes itself by offering a holistic view of the decentralized ecosystem, bridging the gap between low-level protocol interaction and high-level application design. It c
Offloads large data assets to decentralized file systems to ensure content persistence and availability.
FastDFS is a distributed file system and object store designed as a high-capacity file server. It functions as a cluster storage manager that saves, syncs, and accesses large volumes of unstructured data across a network of distributed servers. The system uses unique identifiers for file retrieval and indexing instead of traditional hierarchical naming to avoid metadata bottlenecks. It manages file attributes through key-value metadata mapping and employs a distributed replication model to ensure high availability and data redundancy across storage groups. The project provides capabilities f
Saves and manages large volumes of unstructured data across a network of distributed servers for increased capacity.
This project is a GitOps infrastructure framework designed for managing bare metal servers, container clusters, and networking. It serves as a declarative system for orchestrating the deployment and lifecycle of self-hosted services, using Git as the source of truth to synchronize the desired state of the environment. The framework differentiates itself through a comprehensive automation suite that covers the entire hardware-to-service pipeline. It includes a PXE-based bare metal provisioner for network booting and operating system installation, alongside a lightweight container orchestration
Implements a redundant, distributed block storage layer to provide persistent data access for containerized applications.
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
Distributes data securely across multiple cloud providers and organizational accounts using standardized protocols.
This project is a comprehensive educational resource and curriculum focused on site reliability engineering, distributed systems, and infrastructure operations. It provides technical guides, a systems engineering course, and instructional manuals designed to teach the principles of managing large-scale computing environments. The curriculum covers high-level architectural design for scalability and resilience, including fault-tolerant infrastructure, high-availability patterns, and microservices decomposition. It emphasizes the practical application of site reliability engineering through the
Instruction on managing high-throughput access to large datasets using fault-tolerant distributed storage systems.
ntopng هو أداة مراقبة حركة مرور الشبكة تعتمد على الويب ومجمع لبيانات التدفق. يعمل كمراقب لأمن الشبكة، ونظام إدارة شبكة SNMP، ومحلل بروتوكولات صناعية لبيئات OT و SCADA. يوفر النظام فحصاً متخصصاً للبروتوكولات الصناعية مثل Modbus و DNP3 و IEC 60870. يتميز من خلال الكشف السلوكي عن التهديدات، وتحليل حركة المرور المشفرة عبر بصمات المصافحة، والقدرة على تحديد الأجهزة وأنظمة التشغيل باستخدام أنماط DHCP وعناوين MAC. تشمل قدراته الأوسع تحليل حركة المرور في الوقت الفعلي والتقاط الحزم، ورسم خرائط طوبولوجيا الشبكة، وتنسيق تسلسلات هرمية للمجمعين. يدير النظام أيضاً التحكم في الوصول إلى الشبكة من خلال بوابات المصادقة، ويفرض حصصاً لحركة المرور، ويصدر بيانات التدفق والتنبيهات إلى قواعد بيانات خارجية مثل ClickHouse و Elasticsearch و Kafka. يدعم المشروع تنفيذ مثيلات مراقبة مستقلة متعددة على مضيف واحد باستخدام تكوينات معزولة.
Distributes network monitoring data across database nodes to ensure high availability and scalable storage.
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Allows sharing a single storage implementation across multiple data structures using factory patterns.
Microsandbox is a runtime for creating and managing lightweight, hardware-isolated virtual machines — called sandboxes — that boot directly from standard OCI container images. Each sandbox runs as its own host process with a separate kernel, filesystem, and network stack, providing process-per-sandbox isolation. The project includes a command-line tool and multi-language SDKs (Rust, TypeScript, Python, Go) for programmatic lifecycle control, and it communicates with sandbox agents over Unix sockets using a CBOR-encoded protocol. What distinguishes Microsandbox is its combination of host-manag
Mounts directory-backed volumes or disk images across sandboxes for shared filesystem access.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Provides a SQL interface to query aggregated event data, enabling unified views across distributed microservice architectures.