Why is clickhouse/clickhouse a recommended Data Storage Optimizers GitHub Repositories repository?

Automatically decomposes semi-structured data into dense internal columns to optimize storage space and query performance.

Why is facebook/rocksdb a recommended Data Storage Optimizers GitHub Repositories repository?

Optimizes remote storage access using asynchronous I/O and prefetching to reduce latency on network filesystems.

Why is openzeppelin/openzeppelin-contracts a recommended Data Storage Optimizers GitHub Repositories repository?

Writes data to specific storage slots and packs short strings to reduce gas costs and prevent storage conflicts.

Why is pubkey/rxdb a recommended Data Storage Optimizers GitHub Repositories repository?

Compresses data keys and values to minimize disk space and network bandwidth usage.

Why is vonng/ddia a recommended Data Storage Optimizers GitHub Repositories repository?

Implements indexing strategies and storage engines tailored for transactional and analytical workloads.

Why is ethereumbook/ethereumbook a recommended Data Storage Optimizers GitHub Repositories repository?

Pre-declare accessed addresses and storage slots in an access list to reduce the total gas costs required for complex smart contract interactions.

Why is apache/mxnet a recommended Data Storage Optimizers GitHub Repositories repository?

Restructures and compresses data formats to improve storage efficiency and read performance for large datasets.

Why is langchain-ai/langchainjs a recommended Data Storage Optimizers GitHub Repositories repository?

Minimizes storage footprint for large data histories through checkpoint pruning and delta-based updates.

Why is stefan-jansen/machine-learning-for-trading a recommended Data Storage Optimizers GitHub Repositories repository?

Optimizes storage performance by partitioning large datasets into time-based chunks.

Why is victoriametrics/victoriametrics a recommended Data Storage Optimizers GitHub Repositories repository?

Reduces storage footprint and improves query performance by automatically deduplicating data and downsampling older metrics.

49 repositorios

Awesome GitHub RepositoriesData Storage Optimizers

Tools and techniques for restructuring and compressing data formats to improve storage efficiency and retrieval speed.

Distinguishing note: Focuses on structural decomposition of semi-structured data for storage optimization, distinct from general-purpose database engines.

Explore 49 awesome GitHub repositories matching data & databases · Data Storage Optimizers. Refine with filters or upvote what's useful.

Encuentra los mejores repositorios con IA.Buscaremos los repositorios que mejor coincidan usando IA.

clickhouse/clickhouse
ClickHouse/ClickHouse
48,229Ver en GitHub
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
Automatically decomposes semi-structured data into dense internal columns to optimize storage space and query performance.
C++aianalyticsbig-data
Ver en GitHub48,229
facebook/rocksdb
facebook/rocksdb
31,767Ver en GitHub
RocksDB is a high-performance, embeddable persistent key-value library and storage engine based on Log-Structured Merge-trees. It is designed to provide durable storage for large-scale datasets, integrating directly into applications to manage data on flash and RAM-based hardware. The engine is distinguished by its focus on minimizing read and write amplification through multi-threaded compaction and custom memory allocators. It features specialized optimizations for flash storage, including support for zoned block devices, and provides the ability to extend store behavior via external plugin
Optimizes remote storage access using asynchronous I/O and prefetching to reduce latency on network filesystems.
C++databasestorage-engine
Ver en GitHub31,767
openzeppelin/openzeppelin-contracts
OpenZeppelin/openzeppelin-contracts
27,157Ver en GitHub
OpenZeppelin Contracts is a library of modular, secure, and reusable smart contract components designed for the development of decentralized applications. It provides a foundational framework for building standard-compliant contracts, offering battle-tested implementations for token standards, access control, and common utility patterns. The project distinguishes itself through its comprehensive support for complex architectural patterns, including proxy-based upgradeability, role-based access control, and account abstraction. It enables developers to implement modular logic injection via hoo
Writes data to specific storage slots and packs short strings to reduce gas costs and prevent storage conflicts.
Solidityethereumevmsecurity
Ver en GitHub27,157
pubkey/rxdb
pubkey/rxdb
23,048Ver en GitHub
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
Compresses data keys and values to minimize disk space and network bandwidth usage.
TypeScriptangularbrowser-databasecouchdb
Ver en GitHub23,048
vonng/ddia
Vonng/ddia
22,648Ver en GitHub
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Implements indexing strategies and storage engines tailored for transactional and analytical workloads.
Pythonbookdatabaseddia
Ver en GitHub22,648
ethereumbook/ethereumbook
ethereumbook/ethereumbook
21,521Ver en GitHub
This project serves as a comprehensive technical reference and educational platform for the Ethereum ecosystem. It provides a deep dive into the fundamental architecture of decentralized ledger systems, covering the core mechanisms that enable trustless state transitions, cryptographic security, and network consensus. The documentation distinguishes itself by bridging high-level conceptual frameworks with practical implementation details. It details the lifecycle of smart contract development, from source code compilation and bytecode analysis to deployment and interaction patterns. Furthermo
Pre-declare accessed addresses and storage slots in an access list to reduce the total gas costs required for complex smart contract interactions.
blockchainbookdapp
Ver en GitHub21,521
apache/mxnet
apache/mxnet
20,829Ver en GitHub
This project is a deep learning framework designed for constructing, training, and deploying neural networks across diverse hardware environments. It functions as a high-performance tensor computation library that provides both imperative and symbolic programming interfaces, allowing developers to balance flexible, step-by-step model building with the efficiency of compiled computation graphs. The framework distinguishes itself through a hybrid execution engine that integrates declarative graph compilation with imperative runtime logic. It supports scalable, distributed training across multip
Restructures and compresses data formats to improve storage efficiency and read performance for large datasets.
C++mxnet
Ver en GitHub20,829
langchain-ai/langchainjs
langchain-ai/langchainjs
17,818Ver en GitHub
LangChain.js is a framework for building, executing, and monitoring stateful agentic applications. It provides an orchestration engine that models workflows as directed graphs, allowing developers to connect language models, data sources, and external tools into modular, multi-step processes. The platform distinguishes itself through its focus on stateful execution and human-in-the-loop control. It manages agent lifecycles by persisting execution state across threads, enabling fault tolerance and the ability to pause workflows at designated breakpoints for manual review or modification. This
Minimizes storage footprint for large data histories through checkpoint pruning and delta-based updates.
TypeScript
Ver en GitHub17,818
stefan-jansen/machine-learning-for-trading
stefan-jansen/machine-learning-for-trading
16,552Ver en GitHub
This project is a comprehensive framework for engineering financial data pipelines, designed to automate the collection, cleaning, and synchronization of large-scale market datasets. It functions as a quantitative trading data engine, providing the infrastructure necessary to manage historical and real-time asset pricing information for research and machine learning workflows. The system distinguishes itself through a configuration-driven approach to orchestration, allowing users to manage complex data acquisition tasks across multiple financial providers. It features resilient middleware tha
Optimizes storage performance by partitioning large datasets into time-based chunks.
Jupyter Notebookartificial-intelligencedata-sciencedeep-learning
Ver en GitHub16,552
victoriametrics/victoriametrics
VictoriaMetrics/VictoriaMetrics
16,343Ver en GitHub
VictoriaMetrics is a high-performance, scalable time series database and observability platform designed for long-term storage and analysis of metric, log, and trace data. It functions as a unified backend for monitoring ecosystems, offering full compatibility with industry-standard protocols and query languages. The system is built to handle massive data volumes through a distributed architecture that supports horizontal scaling and efficient data lifecycle management. The platform distinguishes itself through a storage engine that utilizes consistent hashing for data sharding and log-struct
Reduces storage footprint and improves query performance by automatically deduplicating data and downsampling older metrics.
Godatabasegrafanagraphite
Ver en GitHub16,343
dask/dask
dask/dask
13,746Ver en GitHub
Dask es un framework de computación paralela y un programador de tareas distribuido diseñado para escalar flujos de trabajo de ciencia de datos en Python desde máquinas individuales hasta grandes clústeres. Funciona como un gestor de recursos de clúster que orquesta la lógica computacional representando las tareas y sus dependencias como grafos acíclicos dirigidos. Esta arquitectura permite al sistema automatizar la distribución de cargas de trabajo a través del hardware disponible mientras gestiona requisitos de ejecución complejos. El proyecto se distingue por un motor de evaluación perezosa que difiere las operaciones de datos hasta que se solicitan explícitamente, permitiendo la optimización global del grafo y una asignación eficiente de recursos. Incorpora el volcado de datos consciente de la memoria para evitar fallos del sistema al procesar conjuntos de datos que exceden la memoria disponible, y utiliza la fusión de grafos de tareas para combinar secuencias de operaciones en pasos de ejecución únicos, minimizando la sobrecarga de programación y la comunicación entre nodos. La plataforma proporciona una superficie de capacidades integral para el análisis de datos a gran escala, incluyendo soporte para aprendizaje automático distribuido, integración de computación de alto rendimiento y procesamiento de datos en paralelo. Ofrece herramientas extensas para la gestión del ciclo de vida del clúster, perfilado de rendimiento y monitoreo en tiempo real de la ejecución de tareas. Los usuarios pueden desplegar estos entornos en diversas infraestructuras, incluyendo hardware local, proveedores de nube, sistemas en contenedores y clústeres de computación de alto rendimiento.
Provides tools for optimizing data storage formats and compression schemes to improve performance across distributed datasets.
Pythondasknumpypandas
Ver en GitHub13,746
microsoft/lora
microsoft/LoRA
13,264Ver en GitHub
LoRA is a framework for parameter-efficient fine-tuning of large-scale neural networks. It functions by injecting trainable low-rank decomposition matrices into frozen model layers, allowing for task-specific adaptation while preserving the integrity of the original base model weights. The project distinguishes itself by enabling the direct merging of these trained low-rank matrices into primary model weights. This process eliminates additional computational overhead during inference, ensuring that adapted models maintain the same performance characteristics as the original architecture. Furt
Minimizes storage requirements by saving only small task-specific adaptation matrices instead of storing entire sets of original model weights.
Pythonadaptationdebertadeep-learning
Ver en GitHub13,264
aws/aws-cdk
aws/aws-cdk
12,817Ver en GitHub
The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane. The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It
Provides automated data partitioning and storage format optimization for improved query performance.
TypeScriptawscloud-infrastructurehacktoberfest
Ver en GitHub12,817
ramsey/uuid
ramsey/uuid
12,620Ver en GitHub
This PHP library provides tools for generating and validating universally unique identifiers according to RFC 4122 standards. It implements a generation tool for creating version 1, 3, 4, and 5 identifiers, as well as sequential and Nil UUIDs. The library features specialized capabilities for transforming identifiers between hexadecimal strings, binary bytes, integers, and date objects. It supports the generation of sequential identifiers to improve database indexing and storage performance, as well as deterministic name-based identifiers using MD5 or SHA-1 hashing. The project includes a va
Implements sequential identifier generation and binary encoding to optimize database indexing and reduce storage fragmentation.
PHPguididentifiersphp
Ver en GitHub12,620
kopia/kopia
kopia/kopia
12,612Ver en GitHub
Kopia is a backup utility designed to create encrypted, deduplicated, and compressed snapshots of files and directories. It functions as a client-side tool that secures data locally before transmitting it to various storage targets, ensuring that sensitive information remains protected throughout the backup process. The system utilizes content-addressable block storage and metadata-driven versioning to identify and remove redundant data across multiple snapshots. By employing a pluggable storage abstraction layer, it supports a wide range of local, network, and cloud-based storage providers,
Optimizes storage efficiency by applying compression and deduplication before writing to disk.
Gobackupclouddeduplication
Ver en GitHub12,612
citusdata/citus
citusdata/citus
12,562Ver en GitHub
Citus is a PostgreSQL extension that transforms a standard database into a distributed system. It functions as a sharding framework and distributed SQL engine, enabling horizontal scaling by partitioning tables across a cluster of nodes. By utilizing a coordinator-worker topology, the system manages metadata and routes queries to the appropriate nodes, allowing for parallel execution of complex operations across distributed data shards. The platform distinguishes itself through its specialized support for multi-tenant architectures and real-time analytical processing. It enables tenant-based
Incorporates semi-structured data formats into distributed tables to track complex metrics without requiring rigid schema changes for every new attribute.
Ccituscitus-extensiondatabase
Ver en GitHub12,562
openzfs/zfs
openzfs/zfs
12,293Ver en GitHub
ZFS is an enterprise-grade file system and logical volume manager that integrates storage pooling with advanced data protection. It functions as a storage engine that aggregates multiple physical devices into a unified resource pool, allowing for the dynamic allocation of capacity across individual file systems. The system utilizes a transactional, copy-on-write architecture that ensures file system consistency through intent logging and atomic operations. It maintains data integrity by organizing blocks into a hierarchical tree structure, where cryptographic checksums are used to detect and
Adjusts input and output scheduling and workload parameters to maximize data transfer speeds.
Cfile-systemopenzfssystem-software
Ver en GitHub12,293
datahub-project/datahub
datahub-project/datahub
12,141Ver en GitHub
DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations. The platform distinguishes itself through its focus on grounding artificial intelligence and autono
Analyzes usage patterns to identify and remove redundant datasets, optimizing cloud storage costs.
Pythondata-catalogdata-discoverydata-governance
Ver en GitHub12,141
coleifer/peewee
coleifer/peewee
11,971Ver en GitHub
Peewee is a SQL object-relational mapper and query builder that provides an object-oriented interface for mapping application classes to relational database tables. It functions as a relational database toolkit for managing schemas, executing migrations, and handling complex table relationships. The project distinguishes itself by providing an asyncio database driver for non-blocking database operations, ensuring event loop responsiveness. It also supports semi-structured data storage, allowing the storage and querying of flexible JSON documents within traditional relational database systems.
Integrates flexible JSON and HStore data formats into relational tables for semi-structured storage.
Pythonasynciodankfastapi
Ver en GitHub11,971
realm/realm-java
realm/realm-java
11,464Ver en GitHub
Realm Java is a NoSQL mobile object database and reactive database engine. It provides a persistent local data store that saves native objects directly to disk, replacing traditional SQL storage and object-relational mapping layers. The system functions as a real-time data synchronizer, coordinating local database changes with a cloud backend across multiple devices. It integrates a reactive engine that uses change listeners and asynchronous event streams to automatically update user interfaces when underlying data changes. The project covers object-oriented data modeling, CRUD operations, a
Includes tools for reducing database file size through compaction and managing memory via frozen data snapshots.
Java
Ver en GitHub11,464

Awesome Data Storage Optimizers GitHub Repositories

ClickHouse/ClickHouse

facebook/rocksdb

OpenZeppelin/openzeppelin-contracts

pubkey/rxdb

Vonng/ddia

ethereumbook/ethereumbook

apache/mxnet

langchain-ai/langchainjs

stefan-jansen/machine-learning-for-trading

VictoriaMetrics/VictoriaMetrics

dask/dask

microsoft/LoRA

aws/aws-cdk

ramsey/uuid

kopia/kopia

citusdata/citus

openzfs/zfs

datahub-project/datahub

coleifer/peewee

realm/realm-java

Explorar subetiquetas