8 Repos
Systems that store remote files and objects in a local cache to reduce latency for analytical queries.
Distinct from Distributed Caching: Distinct from general distributed caching: focuses on caching remote storage data for analytical query performance.
Explore 8 awesome GitHub repositories matching data & databases · Distributed Data Caching Layers. Refine with filters or upvote what's useful.
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Stores files and objects from remote storage in a local cache layer to speed up data retrieval.
q is a command-line utility for the processing, filtering, and aggregation of tabular text and database files using standard SQL syntax. It functions as a query engine that treats CSV and TSV files, as well as standard input, as relational database tables. The tool distinguishes itself by providing a persistent cache layer that stores processed tabular data in a binary format to accelerate repeated queries on large datasets. It also maps individual filenames or stream identifiers to relational table names, enabling SQL joins across disparate text files. The project covers a broad range of da
Stores processed tabular data in a binary format to bypass repetitive parsing of large raw text files.
Alluxio is a virtual distributed file system and data orchestration layer that serves as a high-performance caching layer between cloud storage and compute clusters. It acts as a distributed data cache designed to accelerate data access for large-scale analytics and machine learning workloads. The system provides a unified interface that presents multiple heterogeneous storage backends as a single coherent namespace. This allows for the unification of diverse storage systems, enabling computation engines to access data from different providers without changing application code. The project c
Functions as a virtual distributed file system that abstracts and caches data across diverse storage backends.
CppGuide is a curated collection of educational resources and practical guides focused on C++ server development, Linux kernel internals, concurrent programming, network protocols, and security exploitation. It provides structured learning paths for backend developers, covering everything from interview preparation to building high-performance network servers and understanding operating system fundamentals. The guide distinguishes itself by offering in-depth, hands-on tutorials that walk through real-world implementations, including building a Redis-like server from scratch, designing custom
Guides building a thread-safe, sharded distributed cache with configurable eviction policies.
Concurrent-map is a lock-striped hash map and sharded concurrent cache for Go, designed as a high-performance key-value store that enables thread-safe parallel reads and writes with minimal blocking. It replaces a single global mutex with per-shard locking, using hash-based key distribution to assign entries to independent segments, allowing multiple goroutines to operate simultaneously without race conditions. The library achieves its performance through fine-grained locking and a lock-free read path, where each shard operates independently with its own lock, enabling parallel reads and writ
Provides a thread-safe in-memory cache supporting parallel reads and writes without race conditions.
DashMap ist eine Concurrent-Hash-Map für Rust, die ein thread-sicheres assoziatives Array für hochperformanten Multi-Thread-Zugriff bereitstellt. Sie dient als parallele Datenstruktur, die gleichzeitige Lese- und Schreibvorgänge ermöglicht, ohne dass ein globaler Lock erforderlich ist. Das Projekt verwendet eine Sharded-Lock-Architektur, um Thread-Konkurrenz zu reduzieren, und setzt feingranulare Locks auf Shard-Ebene ein. Es ist eine Serde-kompatible Map, die Serialisierung und Deserialisierung implementiert, um Map-Daten in gängige Formate zu konvertieren und daraus zurückzuführen. Die Bibliothek deckt Funktionen für parallele Datenspeicherung, Shared-State-Management und die Implementierung thread-sicherer Caches ab.
Provides the underlying primitives necessary to build thread-safe sharded caches.
Cinder is a high-performance Python runtime implementation based on CPython. It is designed as an execution environment optimized for large-scale distributed systems and cloud environments. The project integrates a distributed memory cache and an asynchronous memory layer to manage data across multiple network nodes. It also provides a native C extension framework for developing high-performance compiled modules that link directly into the interpreter memory space. The system covers capabilities for asynchronous data retrieval, large-scale execution, and the integration of embedded scripting
Provides a high-performance data layer via a scalable network of memory nodes.
This project is a disk-backed key-value store and persistent data structure library for Python. It provides a mechanism for persisting mappings, sets, and queues to the local filesystem to bypass memory limitations and cache expensive function results across threads and processes. The system serves as a cross-process synchronization tool, offering distributed locks, semaphores, and barriers to coordinate shared resource access. It implements advanced caching strategies such as probabilistic stampede prevention, sharded data partitioning to increase throughput, and least-recently-used eviction
Implements sharded data partitioning to divide the cache into multiple storage buckets, reducing write contention.