Why is awesome-selfhosted/awesome-selfhosted a recommended Distributed Storage GitHub Repositories repository?

Replicates data across multiple physical locations to ensure high availability and resilience for object storage services.

Why is ente-io/ente a recommended Distributed Storage GitHub Repositories repository?

Synchronizes encrypted information across multiple geographical regions and providers to guarantee permanent accessibility.

Why is jaegertracing/jaeger a recommended Distributed Storage GitHub Repositories repository?

Persists trace data to distributed database backends for scalable long-term storage and retrieval.

Why is redis/go-redis a recommended Distributed Storage GitHub Repositories repository?

Distributes data across multiple nodes and replicates it to ensure high availability and persistent storage for large-scale datasets.

Why is spotify/luigi a recommended Distributed Storage GitHub Repositories repository?

Integrates with distributed storage systems to read and write files within automated batch processing tasks.

Why is kubesphere/kubesphere a recommended Distributed Storage GitHub Repositories repository?

Manages distributed storage layers to optimize data access and performance for containerized workloads.

Why is scylladb/scylla a recommended Distributed Storage GitHub Repositories repository?

Implements a distributed storage architecture ensuring high availability and fault tolerance across multiple nodes.

Why is cvat-ai/cvat a recommended Distributed Storage GitHub Repositories repository?

Supports mounting shared network file systems across nodes to ensure consistent access to large-scale datasets.

Why is theonedev/onedev a recommended Distributed Storage GitHub Repositories repository?

Scales and organizes data storage across multiple nodes to support large-scale project requirements.

Why is rook/rook a recommended Distributed Storage GitHub Repositories repository?

Implements a distributed storage operator to manage the lifecycle, scaling, and upgrades of block, file, and object storage.

41 مستودعات

Awesome GitHub RepositoriesDistributed Storage

High-availability storage systems designed for synchronization across multiple locations or providers.

Distinguishing note: Focuses on the storage architecture for high availability, distinct from general data replication.

Explore 41 awesome GitHub repositories matching data & databases · Distributed Storage. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

awesome-selfhosted/awesome-selfhosted
awesome-selfhosted/awesome-selfhosted
299,516عرض على GitHub
هذا المشروع عبارة عن دليل منسق من قبل المجتمع للبرمجيات مفتوحة المصدر المصممة للنشر في بيئات الخوادم الخاصة والمختبرات المنزلية. يعمل كمورد شامل لاكتشاف بدائل مستقلة ذاتية الاستضافة لخدمات السحابة السائدة، مما يمكن المستخدمين من الحفاظ على ملكية كاملة للبيانات والتحكم في بنيتهم التحتية الرقمية. يتم تنظيم الدليل من خلال تصنيف هرمي ينظم مجموعة واسعة من التطبيقات في فئات منطقية، تتراوح من إدارة الوسائط وتحليل البيانات إلى التواصل الخاص وأدوات إنتاجية الفريق. يتميز بعملية مراجعة أقران تعاونية، حيث يقوم أعضاء المجتمع بالتحقق من جودة وملاءمة كل طلب لضمان بقاء الدليل دقيقاً وموثوقاً. يغطي المشروع نطاقاً واسعاً من القدرات، بما في ذلك أتمتة البنية التحتية، ونشر الخدمات القائمة على الحاويات، وإدارة التكوين التصريحي. تساعد هذه الأدوات المستخدمين في الحفاظ على بيئات خادم قابلة للتكرار وإدارة تبعيات الخدمات المعقدة عبر الأجهزة الخاصة. يتم الحفاظ على الدليل كمستودع خاضع للتحكم في الإصدار، مما يضمن تتبع جميع التحديثات والتغييرات التي يقودها المجتمع وأنها شفافة.
Replicates data across multiple physical locations to ensure high availability and resilience for object storage services.
awesomeawesome-listcloud
عرض على GitHub299,516
ente-io/ente
ente-io/ente
27,281عرض على GitHub
Ente is a privacy-focused platform for end-to-end encrypted storage and two-factor authentication management. It functions as a zero-knowledge identity provider, ensuring that all cryptographic operations, key derivation, and data encryption occur locally on the user's device. By maintaining this architecture, the service provider remains unable to access or decrypt any stored personal information or authentication credentials. The platform distinguishes itself through a combination of on-device intelligence and resilient data distribution. It utilizes a local machine learning engine to perfo
Synchronizes encrypted information across multiple geographical regions and providers to guarantee permanent accessibility.
Dart2faandroidauthy
عرض على GitHub27,281
jaegertracing/jaeger
jaegertracing/jaeger
22,890عرض على GitHub
Jaeger is a distributed tracing platform used for collecting, storing, and visualizing request flows across microservices. It identifies performance bottlenecks and errors by tracking requests as they move through multiple service boundaries. The system includes telemetry collectors, a multi-tenant backend, and a trace visualizer. The platform provides a multi-tenant tracing infrastructure that isolates data and queries by tenant to support shared environments. It supports standardized telemetry ingestion via the OpenTelemetry Protocol over gRPC and HTTP. To manage storage costs and overhead,
Persists trace data to distributed database backends for scalable long-term storage and retrieval.
Gocncfdistributed-tracinghacktoberfest
عرض على GitHub22,890
redis/go-redis
redis/go-redis
22,159عرض على GitHub
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
Distributes data across multiple nodes and replicates it to ensure high availability and persistent storage for large-scale datasets.
Gogogolangredis
عرض على GitHub22,159
spotify/luigi
spotify/luigi
18,676عرض على GitHub
Luigi is a Python framework designed for building and managing complex batch data pipelines. It functions as a workflow orchestration engine that organizes tasks into directed acyclic graphs, ensuring that jobs execute in the correct logical order based on their dependencies. By utilizing a centralized scheduler, the system coordinates task execution across distributed environments, tracks global workflow state, and prevents redundant processing by verifying the existence of output targets before triggering any work. The project distinguishes itself through a robust state-tracking mechanism t
Integrates with distributed storage systems to read and write files within automated batch processing tasks.
Pythonhadoopluigiorchestration-framework
عرض على GitHub18,676
kubesphere/kubesphere
kubesphere/kubesphere
16,842عرض على GitHub
KubeSphere is a distributed operating system for cloud-native application management that provides a centralized control plane for Kubernetes clusters. It functions as a comprehensive DevOps portal, enabling teams to orchestrate containerized workloads, manage CI/CD pipelines, and enforce security policies across hybrid cloud, datacenter, and edge environments. The platform distinguishes itself through its multi-cluster federation capabilities and robust multi-tenancy model, which allow for logical resource isolation and granular access control across shared infrastructure. It integrates a mo
Manages distributed storage layers to optimize data access and performance for containerized workloads.
Goargocdcloud-nativecncf
عرض على GitHub16,842
scylladb/scylla
scylladb/scylla
15,609عرض على GitHub
Scylla is a distributed wide column NoSQL database designed as a high-performance data store. It functions as a Cassandra compatible database and a DynamoDB compatible store, implementing a shared-nothing architecture built on an asynchronous event-driven framework. The system emulates cloud-based APIs to support applications built for proprietary cloud protocols and implements the Cassandra Query Language for high-throughput workloads. This allows for the migration of cloud workloads to self-hosted environments while maintaining API compatibility. The project covers distributed data storage
Implements a distributed storage architecture ensuring high availability and fault tolerance across multiple nodes.
C++
عرض على GitHub15,609
cvat-ai/cvat
cvat-ai/cvat
15,317عرض على GitHub
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
Supports mounting shared network file systems across nodes to ensure consistent access to large-scale datasets.
Pythonannotationannotation-toolannotations
عرض على GitHub15,317
theonedev/onedev
theonedev/onedev
14,705عرض على GitHub
OneDev is a self-hosted, unified development platform that integrates Git repository hosting, issue tracking, and continuous integration and deployment (CI/CD) into a single system. It provides a comprehensive environment for managing the entire software lifecycle, allowing teams to coordinate code reviews, track development tasks, and automate build pipelines through a centralized interface. The platform distinguishes itself by offering browser-based, containerized development environments that allow developers to access and edit project files directly on the server. Its build system utilize
Scales and organizes data storage across multiple nodes to support large-scale project requirements.
Javaci-cddevopsgit
عرض على GitHub14,705
rook/rook
rook/rook
13,553عرض على GitHub
Rook is a Kubernetes storage orchestrator and distributed storage operator that automates the deployment and management of storage clusters. It serves as a multi-protocol storage provider, offering block, file, and object storage capabilities to containerized workloads. The system focuses on providing a self-healing storage cluster that replicates data across hardware nodes to maintain availability and recover from failures. It uses an operator-led model to handle the installation, scaling, and upgrades of storage nodes and daemons. The orchestrator covers a broad range of provisioning servi
Implements a distributed storage operator to manage the lifecycle, scaling, and upgrades of block, file, and object storage.
Gocephcloud-nativecncf
عرض على GitHub13,553
aws/aws-cdk
aws/aws-cdk
12,817عرض على GitHub
The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane. The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It
Distributes storage resources across multiple accounts within an organization to facilitate collaborative data access.
TypeScriptawscloud-infrastructurehacktoberfest
عرض على GitHub12,817
offciercia/defi-developer-road-map
OffcierCia/DeFi-Developer-Road-Map
10,697عرض على GitHub
This project serves as a comprehensive educational roadmap and technical resource collection for developers building decentralized finance applications. It provides a structured curriculum that guides users through the entire lifecycle of blockchain development, from mastering smart contract architecture and security best practices to integrating decentralized infrastructure into modern web applications. The repository distinguishes itself by offering a holistic view of the decentralized ecosystem, bridging the gap between low-level protocol interaction and high-level application design. It c
Offloads large data assets to decentralized file systems to ensure content persistence and availability.
JavaScriptawesomeawesome-listblockchain
عرض على GitHub10,697
happyfish100/fastdfs
happyfish100/fastdfs
9,231عرض على GitHub
FastDFS is a distributed file system and object store designed as a high-capacity file server. It functions as a cluster storage manager that saves, syncs, and accesses large volumes of unstructured data across a network of distributed servers. The system uses unique identifiers for file retrieval and indexing instead of traditional hierarchical naming to avoid metadata bottlenecks. It manages file attributes through key-value metadata mapping and employs a distributed replication model to ensure high availability and data redundancy across storage groups. The project provides capabilities f
Saves and manages large volumes of unstructured data across a network of distributed servers for increased capacity.
Cdistributed-file-storagedistributed-file-systemstorage-servers
عرض على GitHub9,231
khuedoan/homelab
khuedoan/homelab
9,109عرض على GitHub
This project is a GitOps infrastructure framework designed for managing bare metal servers, container clusters, and networking. It serves as a declarative system for orchestrating the deployment and lifecycle of self-hosted services, using Git as the source of truth to synchronize the desired state of the environment. The framework differentiates itself through a comprehensive automation suite that covers the entire hardware-to-service pipeline. It includes a PXE-based bare metal provisioner for network booting and operating system installation, alongside a lightweight container orchestration
Implements a redundant, distributed block storage layer to provide persistent data access for containerized applications.
Pythonansibleargocddevops
عرض على GitHub9,109
delta-io/delta
delta-io/delta
8,596عرض على GitHub
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
Distributes data securely across multiple cloud providers and organizational accounts using standardized protocols.
Scalaacidanalyticsbig-data
عرض على GitHub8,596
linkedin/school-of-sre
linkedin/school-of-sre
8,093عرض على GitHub
This project is a comprehensive educational resource and curriculum focused on site reliability engineering, distributed systems, and infrastructure operations. It provides technical guides, a systems engineering course, and instructional manuals designed to teach the principles of managing large-scale computing environments. The curriculum covers high-level architectural design for scalability and resilience, including fault-tolerant infrastructure, high-availability patterns, and microservices decomposition. It emphasizes the practical application of site reliability engineering through the
Instruction on managing high-throughput access to large datasets using fault-tolerant distributed storage systems.
HTMLgithadooplinux
عرض على GitHub8,093
ntop/ntopng
ntop/ntopng
7,880عرض على GitHub
ntopng هو أداة مراقبة حركة مرور الشبكة تعتمد على الويب ومجمع لبيانات التدفق. يعمل كمراقب لأمن الشبكة، ونظام إدارة شبكة SNMP، ومحلل بروتوكولات صناعية لبيئات OT و SCADA. يوفر النظام فحصاً متخصصاً للبروتوكولات الصناعية مثل Modbus و DNP3 و IEC 60870. يتميز من خلال الكشف السلوكي عن التهديدات، وتحليل حركة المرور المشفرة عبر بصمات المصافحة، والقدرة على تحديد الأجهزة وأنظمة التشغيل باستخدام أنماط DHCP وعناوين MAC. تشمل قدراته الأوسع تحليل حركة المرور في الوقت الفعلي والتقاط الحزم، ورسم خرائط طوبولوجيا الشبكة، وتنسيق تسلسلات هرمية للمجمعين. يدير النظام أيضاً التحكم في الوصول إلى الشبكة من خلال بوابات المصادقة، ويفرض حصصاً لحركة المرور، ويصدر بيانات التدفق والتنبيهات إلى قواعد بيانات خارجية مثل ClickHouse و Elasticsearch و Kafka. يدعم المشروع تنفيذ مثيلات مراقبة مستقلة متعددة على مضيف واحد باستخدام تكوينات معزولة.
Distributes network monitoring data across database nodes to ensure high availability and scalable storage.
Lua
عرض على GitHub7,880
hazelcast/hazelcast
hazelcast/hazelcast
6,570عرض على GitHub
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Allows sharing a single storage implementation across multiple data structures using factory patterns.
Javabig-datacachingdata-in-motion
عرض على GitHub6,570
superradcompany/microsandbox
superradcompany/microsandbox
6,570عرض على GitHub
Microsandbox is a runtime for creating and managing lightweight, hardware-isolated virtual machines — called sandboxes — that boot directly from standard OCI container images. Each sandbox runs as its own host process with a separate kernel, filesystem, and network stack, providing process-per-sandbox isolation. The project includes a command-line tool and multi-language SDKs (Rust, TypeScript, Python, Go) for programmatic lifecycle control, and it communicates with sandbox agents over Unix sockets using a CBOR-encoded protocol. What distinguishes Microsandbox is its combination of host-manag
Mounts directory-backed volumes or disk images across sandboxes for shared filesystem access.
Rust
عرض على GitHub6,570
apache/pinot
apache/pinot
6,098عرض على GitHub
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Provides a SQL interface to query aggregated event data, enabling unified views across distributed microservice architectures.
Java
عرض على GitHub6,098

Awesome Distributed Storage GitHub Repositories

awesome-selfhosted/awesome-selfhosted

ente-io/ente

jaegertracing/jaeger

redis/go-redis

spotify/luigi

kubesphere/kubesphere

scylladb/scylla

cvat-ai/cvat

theonedev/onedev

rook/rook

aws/aws-cdk

OffcierCia/DeFi-Developer-Road-Map

happyfish100/fastdfs

khuedoan/homelab

delta-io/delta

linkedin/school-of-sre

ntop/ntopng

hazelcast/hazelcast

superradcompany/microsandbox

apache/pinot

استكشف الوسوم الفرعية