28 مستودعات
Storage and retrieval layers optimized for high-throughput access to large-scale datasets.
Distinguishing note: Focuses on the infrastructure layer for data performance rather than general data management.
Explore 28 awesome GitHub repositories matching data & databases · High-Performance Data Infrastructures. Refine with filters or upvote what's useful.
This project is a comprehensive platform for quantitative investment research, machine learning, and algorithmic trading. It provides an end-to-end environment for developing, testing, and executing financial strategies, supporting the entire lifecycle from data ingestion and feature engineering to model training and backtesting. The system is distinguished by its configuration-driven workflow orchestration, which allows researchers to automate complex pipelines and manage experiments through declarative files. It features a high-performance data infrastructure that utilizes custom binary for
Maximizes throughput for large-scale financial datasets while ensuring point-in-time data integrity.
EasyExcel is a Java processing library designed for reading and writing XLS, XLSX, and CSV files. It functions as a memory-efficient spreadsheet parser, an object-relational mapper that binds spreadsheet columns to Java class fields, and a stream-based exporter for handling high-volume data. The library distinguishes itself through a streaming model that processes large files row-by-row via listeners to prevent heap memory overflow. It also operates as a template engine, allowing the population of predefined spreadsheet files with dynamic data while preserving original layouts and styles. Br
Reduces memory consumption during high-speed parsing through optimized data manipulation.
Dapper is a high-performance micro-ORM and SQL object mapper for .NET. It functions as an ADO.NET extension library that adds data mapping capabilities directly to database connections, allowing SQL query results to be transformed into typed objects. The project prioritizes execution speed and low memory overhead by using intermediate language generation to map database columns to object properties. It further optimizes performance through the use of concurrent caching for mapping functions and literal value injection to improve database execution plans. The library covers a broad range of d
Prioritizes execution speed and low memory overhead using runtime IL generation for result parsing.
Fresco is an Android image loading library and cache manager designed to fetch, decode, and display images from network or local sources. It functions as a rendering engine for animated image formats and a streaming system for progressive image loading. The library distinguishes itself through specialized memory management that utilizes off-heap allocation to reduce garbage collection overhead and prevent out-of-memory errors. It includes a dedicated rendering pipeline for animated GIFs and WebP files and supports progressive JPEG decoding to render low-resolution versions of images while the
Places image data in specialized memory regions to increase processing speed and prevent out-of-memory errors.
This project is a curated directory of software, frameworks, and educational resources designed for building, scaling, and maintaining distributed data processing and storage architectures. It serves as a comprehensive index for the distributed computing ecosystem, helping users identify the appropriate tools for managing large-scale information systems. The repository functions as a central hub for data engineering, offering categorized access to technologies that support batch and stream processing, machine learning, and interactive querying. By organizing these resources, it assists in the
Maintains scalable, high-performance storage systems for structured and unstructured data across cloud environments.
Groupcache is a distributed caching library designed to coordinate data retrieval and storage across a cluster of nodes. It functions as a peer-to-peer data store that uses consistent hashing to assign specific keys to canonical owners, ensuring that cached items remain predictable and accessible throughout the network. The system distinguishes itself through a request coalescing engine that merges concurrent requests for the same missing key into a single upstream fetch. This mechanism prevents redundant backend load by ensuring that only one process retrieves the required data while sharing
Caches frequently accessed items across multiple nodes to eliminate system bottlenecks and maintain fast response times.
EnTT is a C++ library designed for data-oriented design and entity component system architecture. It provides a framework for managing game objects and simulation states by separating entity data from logic, allowing for the efficient organization and manipulation of large collections of related data objects. The library utilizes sparse sets to store entities and components in contiguous memory, which facilitates cache-friendly iteration and constant-time lookups. It employs template metaprogramming for compile-time type reflection and type-erasure techniques to provide a unified interface fo
Optimizes CPU cache usage and system throughput for large sets of related data objects.
Coil is an image loading and caching pipeline designed for Android and Compose Multiplatform applications. It functions as a comprehensive loader, caching engine, and rendering utility that asynchronously fetches and displays images from network URLs, local storage, and multiplatform resource systems. The library distinguishes itself through a flexible fetcher-decoder pipeline and an interface-driven component registry, allowing for the integration of custom networking clients and decoders. It provides specialized support for rendering scalable vector graphics, animated formats such as GIF an
Reduces resource consumption through a combination of memory caching, disk storage, and downsampling.
This is a reference implementation library providing a collection of code samples, Transact-SQL scripts, and schemas for SQL Server, Azure SQL, and Azure Synapse. It focuses on providing standardized implementation patterns and reference code for building relational databases and cloud data warehouses. The library distinguishes itself by offering specialized guides and examples for deploying database instances within containerized environments and Azure cloud services. It includes specific reference databases and language extensions for integrating machine learning services and advanced analy
Implements memory-optimized processing techniques to increase transaction speed and reduce system response time.
dtm is a distributed transaction framework and polyglot transaction coordinator designed to maintain data consistency across microservices. It functions as a Saga orchestration engine and a two-phase message coordinator, ensuring that multi-service operations either succeed completely or roll back to a consistent state. The project distinguishes itself by supporting multiple consistency patterns, including Saga, TCC, XA, and outbox patterns, allowing users to select the appropriate model for their specific application requirements. It provides a polyglot integration layer via HTTP and gRPC, e
Processes thousands of requests per second by executing inventory checks and global transactions in memory.
MagicalRecord is a data persistence library and wrapper for Core Data that implements the Active Record pattern. It maps database rows directly to object instances, allowing for the creation, update, and retrieval of records without writing manual query logic. The project functions as a mapping layer that synchronizes object properties with a managed object context. It utilizes generic-based type resolution and model-class querying to enable data fetching directly on model classes, which removes the need for a separate external manager and reduces repetitive fetch request boilerplate. The li
Enables high-speed data retrieval using simplified single-line requests.
Storm is a distributed stream processing framework and fault-tolerant compute engine designed for executing real-time continuous computations across a cluster of machines. It functions as a stateful stream processor and cluster topology manager, enabling the deployment and monitoring of distributed data flow configurations. The system ensures exactly-once semantics by utilizing transactional state management to guarantee that every message in a data stream is processed exactly one time. It further operates as a distributed RPC system, allowing for the integration of non-native languages throu
Implements memory-optimized processing by bypassing serialization for intra-process communication.
FastImageCache is an iOS image caching library that provides a persistent disk-based image store. It utilizes a persistent bitmap cache to store images in uncompressed formats and incorporates an image pre-processing pipeline to optimize assets before they are committed to storage. The library optimizes rendering performance by using memory-mapped image tables for constant-time retrieval and byte-aligned data layouts to prevent memory copies. It organizes images of identical dimensions into shared tables and manages disk space through a least-recently-used cache eviction system. The project
Implements byte-aligned data layouts to prevent expensive memory copies and optimize rendering performance.
asyncpg is an asynchronous database driver and binary protocol client for PostgreSQL. It provides a non-blocking interface for executing SQL statements, streaming result sets, and managing data transfer between an application and a PostgreSQL database. The driver implements the PostgreSQL binary protocol directly to facilitate efficient data transfer and type conversion. It includes a connection pool to maintain and reuse open database connections, reducing the latency associated with repeated handshakes. The project covers a broad range of database integration capabilities, including atomic
Implements cursor-based result streaming to retrieve large datasets while minimizing application memory overhead.
Compressor is an Android image compression library designed to reduce the file size and dimensions of images within mobile applications. It functions as a bitmap optimizer that adjusts image quality and formats to minimize storage footprints and improve network upload speeds. The library operates as an asynchronous image processor, utilizing background threads and reactive streams to compress high-resolution photos. This execution model prevents user interface freezes and maintains application responsiveness during heavy image manipulation tasks. The project covers a broad range of image opt
Uses optimized memory-mapped regions to decode high-resolution photos without triggering out-of-memory errors.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Extracts data from structured response objects using helper methods to access rows, columns, and specific data types.
This is a collection of classical algorithms and data structures implemented as a header-only C++ library. It provides a suite of tools for general algorithm implementation, including data structure management, graph theory analysis, and string processing. The library is distinguished by its specialized toolkits for cryptographic hashing and encoding, featuring implementations of MD5, SHA-1, and Base64. It also includes advanced capabilities for high-performance string processing via suffix trees and arrays, as well as computational number theory for primality testing and arbitrary-precision
Uses specialized filters like Bloom filters to optimize data membership lookups.
Zend Framework عبارة عن مجموعة شاملة من المكونات المنفصلة لبناء تطبيقات ويب معيارية تعتمد على الأحداث. ينفذ بنية MVC لفصل منطق الأعمال عن واجهة المستخدم ويوفر نظاماً مهيكلاً لمعالجة الطلبات من خلال خط أنابيب برمجيات وسيطة (Middleware) متسلسل. يتميز المشروع بحاوية حقن التبعية (Dependency Injection) القائمة على المصنع لأتمتة إنشاء الكائنات وإدارة دورات حياة الفئات. كما يتضمن مجموعة أمان شاملة للتحقق من هويات المستخدمين وتقييد الوصول إلى الموارد باستخدام قوائم التحكم في الوصول (ACL) ومحولات التحكم في الوصول القائمة على الأدوار (RBAC). يغطي إطار العمل مجموعة واسعة من القدرات، بما في ذلك تجريد قاعدة البيانات عبر بوابات الجداول والصفوف، وتنفيذ استدعاء الإجراء عن بُعد (RPC) لـ SOAP وJSON-RPC، وإطار عمل تطبيق وحدة التحكم لواجهات سطر الأوامر. تشمل المساحة الإضافية تسلسل البيانات، والتحقق من صحة المدخلات، وإدارة الجلسة، وأدوات تسليم البريد الإلكتروني وتدويل المحتوى.
Employs techniques for fetching large datasets with minimal memory overhead to prevent crashes in constrained environments.
This repository provides a collection of reference implementations, toolkits, and orchestration tools for training and deploying large-scale AI models on Cloud TPU hardware. It serves as a framework for managing the lifecycle of accelerator clusters, including hardware orchestration and the provisioning of high-performance compute infrastructure for machine learning workloads. The project specifically enables the pre-training of foundation models, large language models, and complex reasoning architectures through distributed training toolkits and multi-host scaling recipes. It further provide
Connects optimized high-throughput disks to virtual machines for efficient training data access.
rkyv is a zero-copy deserialization framework for Rust that provides a binary serialization format for memory-mappable data archives. It allows complex data structures to be mapped to bytes and accessed directly from a buffer without allocating new memory or copying data. The project enables the serialization of polymorphic types and trait objects, maintaining their dynamic behavior and structure within the binary form. It utilizes relative-pointer addressing and byte-aligned structure packing to ensure data remains valid regardless of where it is loaded in memory. The framework covers high-
Enables high-throughput storage and retrieval of large datasets by eliminating traditional deserialization costs.