58 dépôts
Techniques for optimizing concurrent write operations using thread-safe data copies.
Distinguishing note: No existing candidate covers thread-safe write optimization for collections.
Explore 58 awesome GitHub repositories matching data & databases · Concurrent Write Optimizations. Refine with filters or upvote what's useful.
LevelDB is an embedded database library and persistent storage engine that provides a sorted key-value store. It uses a log-structured merge-tree architecture to map byte arrays to values, running directly within a process to provide storage without the need for a separate server process. The system is distinguished by its use of custom comparison functions to define key ordering, enabling efficient range scans and sequenced lookups. It ensures data reliability through atomic batch execution, consistent snapshot generation, and log-based recovery after failures. The engine covers broad capab
Uses atomic batches and asynchronous writes to handle large volumes of data updates efficiently.
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Prevents data corruption by mitigating write skew in concurrent transaction environments.
MMKV is a high-performance, cross-platform key-value storage framework designed for mobile platforms and POSIX environments, including Android, iOS, macOS, and Windows. It provides a persistence layer that utilizes memory-mapped files and binary serialization to achieve low-latency data access. The project distinguishes itself through native support for multi-process synchronization, allowing concurrent read and write operations across different application processes. It also implements security via AES encryption for data at rest, featuring symmetric encryption and key rotation to protect st
Minimizes I/O operations by comparing new values against existing ones before committing writes to disk.
Dapper is a high-performance micro-ORM and SQL object mapper for .NET. It functions as an ADO.NET extension library that adds data mapping capabilities directly to database connections, allowing SQL query results to be transformed into typed objects. The project prioritizes execution speed and low memory overhead by using intermediate language generation to map database columns to object properties. It further optimizes performance through the use of concurrent caching for mapping functions and literal value injection to improve database execution plans. The library covers a broad range of d
Uses concurrent caches to store generated mapping functions and avoid redundant compilation of query logic.
Caffeine is a high-performance caching library for the Java virtual machine designed to manage object lifecycles within the application heap. It functions as a thread-safe, memory-resident data store that reduces latency by keeping frequently accessed objects available for immediate retrieval. The library distinguishes itself through a sophisticated eviction strategy that balances recency and frequency to determine which entries to retain. It utilizes a frequency-based admission policy to evaluate the historical access patterns of new data, ensuring that the cache remains populated with the m
Provides a thread-safe data structure designed for high-concurrency environments to manage object lifecycles.
rqlite is a distributed relational database that replicates SQLite data across a cluster using the Raft consensus algorithm. It functions as a fault-tolerant storage system that provides high availability and a web API for executing SQL queries and managing relational data without requiring native database drivers. The system distinguishes itself by using an HTTP SQL interface to expose database operations and cluster management. It features a real-time change data capture stream that pushes database mutations to external HTTP endpoints via webhooks and supports the scaling of read throughput
Improves write performance using queued writes and maintains storage efficiency through automatic vacuuming.
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
Buffers write operations locally and synchronizes them in a single transaction to optimize performance.
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Redistributes data across nodes to prevent skew and dynamically scales writer tasks to improve throughput.
This is a mobile object database and NoSQL local data store that replaces relational tables with a schema-based model. It functions as a reactive data store, using live object observations and change notifications to trigger automatic user interface refreshes. The system provides built-in mobile cloud data synchronization to keep local datasets consistent with a remote server across multiple devices. It also includes security features for encrypted local storage, protecting sensitive on-disk data using at-rest encryption keys and fine-grained access control. Broad capabilities include object
Supports modifying data on background threads to maintain a responsive main user interface.
FoundationDB is an ACID-compliant distributed transactional key-value store. It functions as a scalable database engine that ensures strict serializability and data consistency across a cluster of servers using a shared-nothing architecture. The system is distinguished by its multi-region replication capabilities, allowing data to be synchronized across different datacenters for high availability and disaster recovery. It utilizes optimistic concurrency control to manage distributed transactions and employs a majority-based coordination system to maintain cluster state. The platform provides
Coordinates atomic read-modify-write transformations to ensure consistency without causing transaction conflicts.
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
Routes requests directly to the appropriate data partition using shard-aware connectivity to maximize system throughput.
Bolt is a single-file embedded key-value store for Go applications. It is an ACID transactional database that organizes data in B+trees on disk to provide efficient sorted key retrieval and range scans. The system uses a memory-mapped model to map the database file directly into the process address space for fast random-access reads. The project distinguishes itself through a multi-version concurrency control architecture that allows multiple simultaneous readers to access a consistent snapshot of data without blocking a writer. It employs a single-writer multi-reader locking model and uses a
Combines multiple write operations into single transactions to increase throughput and reduce disk commit overhead.
Memcached is a high-performance, distributed, in-memory key-value storage and request routing engine. It functions as a volatile data store designed to accelerate dynamic applications by caching objects in RAM, thereby reducing backend database load and providing sub-millisecond response times. The system utilizes a specialized architecture that organizes memory into fixed-size slabs to minimize fragmentation and maximize throughput for high-concurrency workloads. The project distinguishes itself through a multi-threaded, lock-friendly design that scales across CPU cores and supports complex
Coordinates concurrent client requests during cache misses by designating a single winner to refresh data.
Garnet is a multi-threaded in-memory database and distributed key-value store. It functions as a high-performance remote cache store that implements the RESP wire protocol to maintain compatibility with existing Redis clients and libraries. The project is distinguished by a shared-memory architecture that enables parallel request processing across multiple cores for sub-millisecond latency. It features a tiered storage system that automatically offloads colder data from system memory to SSD or cloud storage layers, and includes a specialized vector search database for high-dimensional similar
Implements latch-free indexes and cache-line alignment to enable high-throughput multi-threaded access with minimal locking.
Falcor is a JavaScript library that models remote data as a single virtual JSON graph, providing a path-based query engine for efficient client-side data retrieval and updates. It represents multiple remote data sources as a unified document where entities are accessed via globally unique identity paths. The system distinguishes itself by treating the remote data model as a virtual JSON resource, allowing the client to query specific paths without managing individual endpoints. It uses a reference-aware graph model to handle many-to-many relationships and prevents data duplication. Network ef
Updates values in a graph object at specified paths and returns the modified subset asynchronously.
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Provides high-throughput S3 data management using parallel operations and recursive prefix loading.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Manages simultaneous data updates to the same table using atomic writes.
go-cache is a thread-safe, in-memory cache library for Go that stores arbitrary objects with per-item expiration timestamps. It provides a concurrent key-value store where multiple goroutines can safely read and write shared cached data without external synchronization, using a mutex-guarded map for access control. The library distinguishes itself through its expiration management and optional disk persistence. Each cached item carries its own time-to-live, and a background goroutine periodically purges expired entries. The cache can serialize its entire contents to disk using Go's gob encodi
Shares a cache safely across multiple goroutines without manual locking or race conditions.
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
Prevents data corruption and write skew anomalies by detecting conflicts between simultaneous write operations.
better-sqlite3 is a high-performance SQLite3 client for Node.js that executes queries synchronously, returning results directly without callbacks or promises. It compiles as a native addon using N-API, binding directly to the SQLite3 C library for immediate query execution and zero-copy result serialization into native JavaScript objects. The library is optimized for Write-Ahead Logging (WAL) mode, enabling faster concurrent reads and writes in web applications. It provides durability level tuning through the synchronous pragma, allowing adjustments between FULL, NORMAL, and OFF modes to bala
Improves read and write performance for concurrent database access by enabling Write-Ahead Logging.