Why is mem0ai/mem0 a recommended Data Compression GitHub Repositories repository?

Summarizes and structures raw interaction data into compact, machine-readable formats to optimize storage efficiency and retrieval latency.

Why is google/leveldb a recommended Data Compression GitHub Repositories repository?

Compresses and decompresses data to balance processing performance with disk space reduction.

Why is dmlc/xgboost a recommended Data Compression GitHub Repositories repository?

Handles massive datasets by storing data in compressed on-disk blocks and loading them as needed.

Why is pubkey/rxdb a recommended Data Compression GitHub Repositories repository?

Reduces storage footprint by automatically mapping long attribute names to shorter keys based on schema.

Why is forem/forem a recommended Data Compression GitHub Repositories repository?

Provides tools to trigger compression tasks on demand for better resource management and storage optimization.

Why is timescale/timescaledb a recommended Data Compression GitHub Repositories repository?

Compresses time-series data by converting row-oriented data to a columnar format with type-specific compression.

Why is othmanadi/planning-with-files a recommended Data Compression GitHub Repositories repository?

Optimizes internal data formats to reduce computational overhead and lower the cost of processing large-scale organizational knowledge.

Why is stocksharp/stocksharp a recommended Data Compression GitHub Repositories repository?

Uses specialized compression formats to reduce disk usage and increase read speed for tick-level data.

Why is dusty-nv/jetson-inference a recommended Data Compression GitHub Repositories repository?

Utilizes AI-enabled compression to represent large-scale volumes more efficiently.

Why is wiselibs/better-sqlite3 a recommended Data Compression GitHub Repositories repository?

Processes queries efficiently on multi-gigabyte databases using proper indexing and joins.

25 مستودعات

Awesome GitHub RepositoriesData Compression

Techniques for structuring and summarizing raw data into compact formats to optimize storage and latency.

Distinguishing note: Focuses on schema-based summarization for AI memory, distinct from general file compression.

Explore 25 awesome GitHub repositories matching data & databases · Data Compression. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

mem0ai/mem0
mem0ai/mem0
58,698عرض على GitHub
Mem0 is an agent-agnostic memory layer designed to provide intelligent agents with long-term persistence and cross-session state management. By acting as a centralized service, it allows diverse AI agents to recall user preferences, past interactions, and historical context, ensuring continuity across multiple workflows and independent agent systems. The platform distinguishes itself through a multi-signal retrieval engine that combines semantic vectors, keyword matching, and entity-linked metadata to surface the most relevant information. It employs an adaptive memory engine that automatical
Summarizes and structures raw interaction data into compact, machine-readable formats to optimize storage efficiency and retrieval latency.
Pythonagentsaiai-agents
عرض على GitHub58,698
google/leveldb
google/leveldb
39,152عرض على GitHub
LevelDB is an embedded database library and persistent storage engine that provides a sorted key-value store. It uses a log-structured merge-tree architecture to map byte arrays to values, running directly within a process to provide storage without the need for a separate server process. The system is distinguished by its use of custom comparison functions to define key ordering, enabling efficient range scans and sequenced lookups. It ensures data reliability through atomic batch execution, consistent snapshot generation, and log-based recovery after failures. The engine covers broad capab
Compresses and decompresses data to balance processing performance with disk space reduction.
C++
عرض على GitHub39,152
dmlc/xgboost
dmlc/xgboost
28,471عرض على GitHub
XGBoost is a distributed machine learning library for implementing scalable gradient boosting decision trees used for regression, classification, and ranking. It functions as a predictive model framework and a cross-language toolkit, providing a core implementation with native bindings for Python, R, Java, Scala, and C++. The system is designed as a GPU-accelerated library that utilizes CUDA and NCCL to speed up the training of decision tree ensembles. It operates as a distributed framework capable of scaling training and prediction across multi-node clusters and GPU environments to process m
Handles massive datasets by storing data in compressed on-disk blocks and loading them as needed.
C++distributed-systemsgbdtgbm
عرض على GitHub28,471
pubkey/rxdb
pubkey/rxdb
23,048عرض على GitHub
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
Reduces storage footprint by automatically mapping long attribute names to shorter keys based on schema.
TypeScriptangularbrowser-databasecouchdb
عرض على GitHub23,048
forem/forem
forem/forem
22,726عرض على GitHub
Forem is an open-source platform designed for building and managing technical communities. It functions as a social publishing engine that enables members to share long-form content, participate in threaded discussions, and engage through social interactions. The platform provides tools for organizations to maintain branded profiles, host community hackathons, and facilitate collaborative learning through structured educational tracks. Beyond its social features, Forem integrates advanced capabilities for AI agent workflow orchestration and codebase knowledge graphing. It allows developers to
Provides tools to trigger compression tasks on demand for better resource management and storage optimization.
Rubycommunitydiscussionfeedback
عرض على GitHub22,726
timescale/timescaledb
timescale/timescaledb
21,876عرض على GitHub
TimescaleDB is an open-source PostgreSQL extension that adds native time-series capabilities to the database. At its core, it transforms standard PostgreSQL tables into hypertables—automatically partitioned by time intervals—so data is stored in fixed-size chunks without manual sharding. The extension includes a library of over 200 built-in SQL functions purpose-built for time-series workloads, such as time bucketing, gap filling, percentile estimation, and time-weighted averages. What distinguishes TimescaleDB from generic PostgreSQL is its set of integrated time-series features that work th
Compresses time-series data by converting row-oriented data to a columnar format with type-specific compression.
Canalyticsdatabasefinancial-analysis
عرض على GitHub21,876
othmanadi/planning-with-files
OthmanAdi/planning-with-files
14,139عرض على GitHub
Planning with files is an enterprise knowledge graph platform designed to transform unstructured organizational data into a searchable, interconnected network. By utilizing a graph-based retrieval-augmented generation engine, the system grounds language model outputs in verified internal data, ensuring that responses are explainable, traceable, and free from hallucinations. The platform distinguishes itself through a focus on data sovereignty and secure, private infrastructure deployment. It enables organizations to maintain full control over sensitive information by processing data locally o
Optimizes internal data formats to reduce computational overhead and lower the cost of processing large-scale organizational knowledge.
Pythonadalagentagent-skills
عرض على GitHub14,139
stocksharp/stocksharp
StockSharp/StockSharp
10,126عرض على GitHub
StockSharp is an algorithmic trading platform and quantitative framework used for developing and deploying trading robots across stock, forex, and cryptocurrency markets. It functions as a multi-asset trading gateway and a dedicated development environment for building, debugging, and scheduling automated strategies. The platform includes a visual strategy workflow editor that maps logic blocks to executable code and a simulation engine that replays historical tick data to validate trading logic. It utilizes a plugin-based broker integration system to normalize diverse exchange protocols into
Uses specialized compression formats to reduce disk usage and increase read speed for tick-level data.
C#
عرض على GitHub10,126
dusty-nv/jetson-inference
dusty-nv/jetson-inference
8,734عرض على GitHub
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Utilizes AI-enabled compression to represent large-scale volumes more efficiently.
C++caffecomputer-visiondeep-learning
عرض على GitHub8,734
wiselibs/better-sqlite3
WiseLibs/better-sqlite3
7,311عرض على GitHub
better-sqlite3 is a high-performance SQLite3 client for Node.js that executes queries synchronously, returning results directly without callbacks or promises. It compiles as a native addon using N-API, binding directly to the SQLite3 C library for immediate query execution and zero-copy result serialization into native JavaScript objects. The library is optimized for Write-Ahead Logging (WAL) mode, enabling faster concurrent reads and writes in web applications. It provides durability level tuning through the synchronous pragma, allowing adjustments between FULL, NORMAL, and OFF modes to bala
Processes queries efficiently on multi-gigabyte databases using proper indexing and joins.
JavaScriptdatabasesqlsqlite
عرض على GitHub7,311
gilbarbara/logos
gilbarbara/logos
6,754عرض على GitHub
Logos is a curated collection of optimized SVG logos for developer tools and brands, stored as individual SVG files in a flat directory structure. The collection is manually selected and optimized to ensure quality and consistency, with each logo served as a raw SVG file that browsers and tools can render natively. The collection supports direct file-system access through its flat directory storage, and includes a lightweight index of brand names and file paths for fast keyword-based logo lookup. Logos are delivered as static assets over HTTP, relying on standard web server caching for perfor
Applies SVG optimization to reduce file size while preserving visual fidelity of logos.
SVGlogossvg
عرض على GitHub6,754
madler/zlib
madler/zlib
6,687عرض على GitHub
zlib is a lossless data compression library that implements the deflate compression algorithm, combining LZ77 sliding window and Huffman coding. It provides the core compression and decompression engines, along with support for gzip, zlib, and raw deflate stream formats, enabling data to be compressed and restored without any loss of information. The library offers a range of capabilities for handling compressed data, including single-call memory and file operations, as well as incremental stream-based processing for working with data larger than available memory. It includes mechanisms for a
Supports processing data exceeding 4 GB without loss or corruption.
C
عرض على GitHub6,687
apache/iotdb
apache/iotdb
6,286عرض على GitHub
Apache IoTDB is a time-series database designed for the Internet of Things, purpose-built to ingest high-volume data from millions of low-power devices and store timestamp-value pairs with configurable data types and encoding schemes. It organizes time series data and device metadata in a tree-like hierarchy, enabling efficient management of complex industrial sensor networks. The database supports rich querying capabilities, including time-aligned data retrieval across multiple devices, time-based aggregation like downsampling, and frequency-domain signal analysis. It provides high-throughpu
Compress time series data with high-ratio algorithms to reduce hardware storage costs.
Javabig-datadatabaseiot
عرض على GitHub6,286
nvidia/isaac-gr00t
NVIDIA/Isaac-GR00T
6,222عرض على GitHub
Reduces the storage and memory footprint of sparse 3D volumes like smoke and clouds using neural compression techniques.
Jupyter Notebook
عرض على GitHub6,222
balloonwj/cppguide
balloonwj/CppGuide
6,030عرض على GitHub
CppGuide is a curated collection of educational resources and practical guides focused on C++ server development, Linux kernel internals, concurrent programming, network protocols, and security exploitation. It provides structured learning paths for backend developers, covering everything from interview preparation to building high-performance network servers and understanding operating system fundamentals. The guide distinguishes itself by offering in-depth, hands-on tutorials that walk through real-world implementations, including building a Redis-like server from scratch, designing custom
Compresses swapped-out pages in RAM before they reach disk to reduce memory pressure.
عرض على GitHub6,030
pheralb/svgl
pheralb/svgl
5,578عرض على GitHub
Applies SVGO compression to SVG logos on demand without modifying the stored originals.
TypeScripthacktoberfestlogosopen-source
عرض على GitHub5,578
loro-dev/loro
loro-dev/loro
5,374عرض على GitHub
Loro is a conflict-free replicated data type (CRDT) framework and collaborative state engine designed for building real-time collaborative applications. It provides a distributed data synchronizer that enables multiple users to edit shared documents and complex nested structures—such as maps, lists, trees, and counters—with automatic state convergence without requiring a central server. The project distinguishes itself through a versioned document store that supports branching, forking, and merging via a directed acyclic graph of causal operation history. It enables advanced version control c
Computes compact diffs between document versions by removing canceling operations to optimize network data transfer.
Rustcollaborative-editingcrdtlocal-first
عرض على GitHub5,374
awslabs/gluon-ts
awslabs/gluon-ts
5,200عرض على GitHub
GluonTS هو إطار عمل للتنبؤ بالسلاسل الزمنية الاحتمالية، مصمم للتنبؤ بالقيم المستقبلية كتوزيعات احتمالية مع فترات ثقة. يدعم كلاً من تدريب النموذج التقليدي والتنبؤ بدون تدريب مسبق (zero-shot)، حيث تولد النماذج المدربة مسبقاً تنبؤات لسلاسل جديدة دون تدريب إضافي. يتميز المشروع بدمج مجموعة واسعة من نهج التنبؤ في سير عمل موحد. يتضمن ذلك بنى التعلم العميق مثل الشبكات العصبية المتكررة والالتفافات السببية، بالإضافة إلى دمج النماذج الإحصائية الخارجية، ومكتبة Prophet، وحزم R. توفر مجموعة الأدوات سطحاً شاملاً لهندسة بيانات السلاسل الزمنية، وتغطي توسيع مجموعة البيانات، والتقسيم، وتحويل البيانات الزمنية الخام إلى موترات (tensors). كما تتضمن مجموعة من أدوات التقييم لقياس دقة التنبؤ وفترات عدم اليقين، بالإضافة إلى أدوات لاستمرارية مجموعة البيانات باستخدام تنسيقات مثل Arrow و Parquet. يدعم إطار العمل نشر نماذج التنبؤ داخل البنية التحتية السحابية.
Writes datasets to binary Arrow or Parquet files using configurable compression and array flattening.
Python
عرض على GitHub5,200
opentsdb/opentsdb
OpenTSDB/opentsdb
5,068عرض على GitHub
OpenTSDB هي قاعدة بيانات موزعة للسلاسل الزمنية ومحرك مقاييس مصمم لتخزين وإدارة أحجام هائلة من مقاييس النظام عالية التباين. تعمل كمخزن بيانات ومنصة تحليلات تتيح استيعاب المقاييس على نطاق واسع ومراقبة أداء البنية التحتية عبر مجموعة موزعة. يتميز النظام بتجريد تخزين موزع يدعم خلفيات متعددة مثل HBase و Cassandra و Google Bigtable. يستخدم شجرة مقاييس هرمية لتنظيم السلاسل الزمنية ويستخدم فهرسة المعرفات الرقمية لتقليل بصمات التخزين وتسريع عمليات البحث للمقاييس الموسومة. يغطي المشروع مجالات قدرات واسعة بما في ذلك تحليل بيانات السلاسل الزمنية مع حسابات النسبة المئوية الموزعة وأخذ العينات الفرعية، بالإضافة إلى إدارة شاملة للبيانات الوصفية. يوفر دمج واجهة برمجة التطبيقات لاستيعاب البيانات والاستعلام، وتخزين مؤقت خارج الكومة (Off-heap) لتحسين الأداء، وأدوات لتدقيق سلامة البيانات وتحليل الشذوذ. يتم إدارة النظام عبر واجهة سطر أوامر لإدارة قاعدة البيانات ومزامنة شجرة المقاييس.
Merges multiple columns within a row into a single column to reduce the physical disk space usage.
Java
عرض على GitHub5,068
m3db/m3
m3db/m3
4,895عرض على GitHub
m3 is a distributed time series database designed for high-resolution metrics and high-cardinality data management. It functions as a scalable storage system and a multi-cluster query engine, providing a distributed metrics aggregator capable of downsampling and summarizing data before it is committed to storage. The project distinguishes itself through a coordinated cluster model using etcd for node membership and shard placement. It supports multiple ingestion protocols, including the Prometheus remote write protocol, InfluxDB line protocol, and Graphite Carbon plaintext protocol, and provi
Implements specialized compression algorithms and hybrid encoding to reduce the memory and disk footprint of time series.
Go
عرض على GitHub4,895

Awesome Data Compression GitHub Repositories

mem0ai/mem0

google/leveldb

dmlc/xgboost

pubkey/rxdb

forem/forem

timescale/timescaledb

OthmanAdi/planning-with-files

StockSharp/StockSharp

dusty-nv/jetson-inference

WiseLibs/better-sqlite3

gilbarbara/logos

madler/zlib

apache/iotdb

NVIDIA/Isaac-GR00T

balloonwj/CppGuide

pheralb/svgl

loro-dev/loro

awslabs/gluon-ts

OpenTSDB/opentsdb

m3db/m3

استكشف الوسوم الفرعية