3 repositorios
Storage layers specifically optimized for tracking metadata in distributed architectures.
Distinct from System Metadata: Represents the domain of using a KV store as a metadata layer for distributed systems.
Explore 3 awesome GitHub repositories matching data & databases · Distributed System Metadata. Refine with filters or upvote what's useful.
RocksDB is a high-performance, embeddable persistent key-value library and storage engine based on Log-Structured Merge-trees. It is designed to provide durable storage for large-scale datasets, integrating directly into applications to manage data on flash and RAM-based hardware. The engine is distinguished by its focus on minimizing read and write amplification through multi-threaded compaction and custom memory allocators. It features specialized optimizations for flash storage, including support for zoned block devices, and provides the ability to extend store behavior via external plugin
Provides a durable and efficient storage layer for tracking metadata in large-scale distributed systems.
Alluxio is a virtual distributed file system and data orchestration layer that serves as a high-performance caching layer between cloud storage and compute clusters. It acts as a distributed data cache designed to accelerate data access for large-scale analytics and machine learning workloads. The system provides a unified interface that presents multiple heterogeneous storage backends as a single coherent namespace. This allows for the unification of diverse storage systems, enabling computation engines to access data from different providers without changing application code. The project c
Provides a distributed metadata service to manage file system hierarchy and location mapping across the cluster.
Gravitino is a federated metadata lake and unified data catalog designed to manage tables, files, and AI models across diverse data sources and cloud storage. It serves as a centralized interface for governing schemas, access controls, and tagging across relational databases, messaging queues, and object stores. The project distinguishes itself by unifying the management of AI assets, such as machine learning models and their version lineages, alongside traditional tabular data. It also implements the Iceberg REST specification to provide a standardized metadata server and proxy for lakehouse
Links distributed dataframes to a unified metadata system for retrieving tables and filesets from diverse sources.