117 مستودعات
Techniques for reducing the storage footprint of data through specialized compression.
Distinguishing note: Focuses on the compression capability, distinct from the storage engine architecture.
Explore 117 awesome GitHub repositories matching data & databases · Data Compression Algorithms. Refine with filters or upvote what's useful.
LevelDB is an embedded database library and persistent storage engine that provides a sorted key-value store. It uses a log-structured merge-tree architecture to map byte arrays to values, running directly within a process to provide storage without the need for a separate server process. The system is distinguished by its use of custom comparison functions to define key ordering, enabling efficient range scans and sequenced lookups. It ensures data reliability through atomic batch execution, consistent snapshot generation, and log-based recovery after failures. The engine covers broad capab
Reduces the disk space footprint by automatically compressing stored data.
RocksDB is a high-performance, embeddable persistent key-value library and storage engine based on Log-Structured Merge-trees. It is designed to provide durable storage for large-scale datasets, integrating directly into applications to manage data on flash and RAM-based hardware. The engine is distinguished by its focus on minimizing read and write amplification through multi-threaded compaction and custom memory allocators. It features specialized optimizations for flash storage, including support for zoned block devices, and provides the ability to extend store behavior via external plugin
Reduces the storage footprint of keys and values using compression algorithms to balance CPU and disk usage.
k6 is a developer-centric load testing suite and command-line load generator designed for network performance validation. It functions as a JavaScript load testing tool that utilizes a Go-based runtime engine to simulate concurrent user traffic and validate API responses across HTTP, gRPC, and WebSockets. The project distinguishes itself by using code rather than a graphical interface to define workload scenarios and performance thresholds. It features a pluggable protocol architecture and an extension ecosystem that allows for the addition of custom protocols and specialized testing capabili
Features a pluggable architecture that allows the addition of custom protocols to extend network testing capabilities.
This project is a computer science educational resource and a library of common data structures and algorithms implemented in Swift. It serves as a practical reference for studying complexity and efficiency through solved algorithmic problems and conceptual guides. The collection includes implementations of linear and hierarchical data structures, such as stacks, queues, linked lists, and trees. It covers a wide range of computational patterns, including graph and pathfinding implementations, mathematical numerical methods, and data compression techniques. The project also provides implement
Implements encoding techniques to reduce data storage size and improve transmission efficiency.
This project is a comprehensive collection of common computer science algorithms and data structures implemented in Swift. It serves as an educational reference and library for studying computational complexity, algorithmic logic, and data structure engineering through practical code examples. The repository provides a wide suite of data structure implementations, including various types of linked lists, heaps, hash tables, and an extensive range of hierarchical trees such as Red-Black, B-Tree, and Splay trees. It also covers diverse sorting and searching techniques, from basic bubble sort to
Implements Huffman coding to reduce data size by assigning variable-length bit strings based on frequency.
WeUI is a mobile web UI library and design system consisting of CSS components and HTML templates. It is specifically designed to replicate the visual identity and interface of the WeChat messaging ecosystem, providing a standardized set of components to build responsive mobile web interfaces. The library functions as a stateless component system, utilizing a pure CSS architecture and HTML templates that rely on external JavaScript for interactivity. It employs a BEM-based class naming convention to manage component nesting and prevent style leakage across complex layouts. The framework incl
Uses visual tokens for identity, though not in the context of image tokenization.
Zstandard is a lossless data compression library and archive format designed for high compression ratios and fast real-time processing. It functions as a real-time data compressor and multi-threaded compression engine capable of distributing workloads across multiple CPU cores to increase throughput. The system features a dictionary-based compressor that trains on sample data to improve the compression ratio and speed of small files. It also provides long distance pattern matching to identify repeated sequences across large files. The library covers a broad range of capabilities including st
Restores compressed data to its original form while supporting legacy formats from earlier versions.
Kratos is a toolkit for building cloud-native microservices in Go. It provides a comprehensive suite of framework primitives, including a dedicated toolset for API-first development using Protobuf to generate server and client code for gRPC and HTTP. The project is distinguished by its pluggable service infrastructure, which allows for the swapping of configuration stores, service registries, and data encoding formats. It utilizes a composable middleware pipeline to inject cross-cutting concerns such as authentication, request validation, and circuit breaking into the service flow. The frame
Decouples core logic from external dependencies using a pluggable architecture based on Go interfaces.
SmartRefreshLayout is a pull-to-refresh framework and gesture interaction library for Android. It provides a scroll view wrapper that integrates interactive refresh headers and loading footers into mobile user interfaces, coordinating synchronized scrolling and touch events. The project features a pluggable architecture for custom refresh indicators and secondary refresh mechanisms. It utilizes damping-based gesture translation and overscroll rebound mechanisms to manage the tactile feel of drag resistance and physics-based animations. The library covers a broad range of interaction logic, i
Provides a pluggable system for swapping out refresh indicators by implementing a common interface.
TDengine is a distributed time-series database designed for the high-speed ingestion, compression, and retrieval of timestamped metrics and sensor data. It functions as a SQL-compatible analytics engine, allowing users to perform complex operations on massive volumes of time-ordered information using standard relational syntax. The platform is built to serve as a backend foundation for industrial IoT environments, managing real-time data streams and device metadata through a cluster-based architecture. The system distinguishes itself through a distributed sharding architecture that uses consi
Compresses data during storage and transmission using specialized algorithms to reduce storage footprint.
pkg is a Node.js executable packager and cross-platform binary compiler. It bundles a project and its dependencies into a single standalone binary, allowing applications to run on machines without a pre-installed runtime. The project distinguishes itself by precompiling JavaScript source code into bytecode to remove human-readable text and obfuscate the logic. It utilizes a virtual filesystem bundler to embed static assets and non-javascript files directly into the executable, employing compression algorithms to reduce the final binary size. The tool covers cross-platform compilation for var
Employs compression algorithms to reduce the final size of the embedded filesystem within the executable binary.
MiniCPM-o is a multimodal large language model designed to function as a real-time conversational assistant on edge devices. By mapping text, image, video, and audio inputs into a unified latent space, the system enables simultaneous cross-modal reasoning and full-duplex interaction. It is built as an edge-side inference engine, utilizing quantized model weights to maintain high-performance processing on consumer hardware. The system distinguishes itself through its integrated speech synthesis and voice cloning capabilities, which allow for the generation of expressive, personalized vocal out
Implements adaptive visual token compression to balance inference speed and accuracy on edge devices.
Passport is a Node.js authentication middleware designed to manage user identities and session states within web applications. It functions as a request identity verifier that secures application routes by validating user credentials before granting access. The system utilizes a modular authentication strategy, allowing identity verification through interchangeable plugins. This architecture supports the creation of custom authentication strategies for local credentials and the integration of federated identity providers using external protocols. The framework provides capabilities for sessi
Implements a pluggable strategy registry to dynamically select authentication mechanisms during the request lifecycle.
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
Reduces storage footprint by applying compression techniques to document keys.
DeepSeek-OCR is a vision processing framework designed to convert image-based text into machine-readable tokens for large language models. It functions as a document inference pipeline that encodes visual data into compact representations, enabling automated optical character recognition and document analysis workflows. The system distinguishes itself through a high-throughput architecture that utilizes hardware-accelerated batch inference to process large volumes of visual data. It incorporates dynamic resolution scaling to manage the balance between visual detail and token consumption, ensu
Compresses image content into optimized token representations for visual analysis.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Applies compression algorithms to data before uploading to remote storage to reduce footprint.
This project is a comprehensive framework and toolkit for developing, optimizing, and deploying transformer-based models across multimodal, document intelligence, and natural language processing tasks. It provides a unified neural architecture that processes text, vision, audio, and document layout data through a shared set of weights, enabling researchers and developers to build foundational models that align cross-modal representations. The platform distinguishes itself through advanced training and inference strategies designed for large-scale deep learning. It incorporates specialized mec
Implements visual data tokenization to convert raw images into discrete tokens using encoder-decoder architectures.
Qwen2-VL is a multimodal large language model and vision language model designed to process and reason across text, images, and video content. It functions as a visual reasoning engine and a visual agent framework, capable of interpreting visual data to perform object detection, document parsing, and spatial reasoning. The model is distinguished by its ability to act as a video understanding model, processing hour-long videos with second-level indexing and event recall. It further differentiates itself through a visual agent capability that interacts with software interfaces and robotic hardw
Controls the resolution and pixel count of visual inputs to balance processing quality with memory constraints.
Temporal is a distributed workflow orchestration engine designed to manage fault-tolerant, stateful, and long-running background processes. It functions as a platform for coordinating complex cross-service operations, ensuring consistency and reliability in distributed environments by decoupling workflow orchestration from task execution. The platform distinguishes itself through a deterministic, event-sourced execution model that reconstructs workflow state by re-executing code from an immutable event log. This approach isolates non-deterministic side effects into managed activities, allowin
Shrinks payload sizes during workflow execution to minimize storage space required for event histories.
Swagger Codegen is a template-driven engine and multi-language toolkit used to generate API client SDKs, server stubs, and human-readable documentation from OpenAPI specifications. It translates these specifications into functional libraries and boilerplate routing code across various target programming languages. The tool utilizes a pluggable generator module system and an integrated template engine, allowing for the customization of generated source code and the creation of new language-specific generators. It supports flexible specification sourcing via local files, remote HTTP endpoints,
Organizes language-specific logic into discrete modules that implement a common interface for creating clients and servers.