Why is microsoft/qlib a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Maximizes throughput for large-scale financial datasets while ensuring point-in-time data integrity.

Why is alibaba/easyexcel a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Reduces memory consumption during high-speed parsing through optimized data manipulation.

Why is stackexchange/dapper a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Prioritizes execution speed and low memory overhead using runtime IL generation for result parsing.

Why is facebook/fresco a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Places image data in specialized memory regions to increase processing speed and prevent out-of-memory errors.

Why is oxnr/awesome-bigdata a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Maintains scalable, high-performance storage systems for structured and unstructured data across cloud environments.

Why is golang/groupcache a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Caches frequently accessed items across multiple nodes to eliminate system bottlenecks and maintain fast response times.

Why is skypjack/entt a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Optimizes CPU cache usage and system throughput for large sets of related data objects.

Why is coil-kt/coil a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Reduces resource consumption through a combination of memory caching, disk storage, and downsampling.

Why is microsoft/sql-server-samples a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Implements memory-optimized processing techniques to increase transaction speed and reduce system response time.

Why is dtm-labs/dtm a recommended High-Performance Data Infrastructures GitHub Repositories repository?

Processes thousands of requests per second by executing inventory checks and global transactions in memory.

28 مستودعات

Awesome GitHub RepositoriesHigh-Performance Data Infrastructures

Storage and retrieval layers optimized for high-throughput access to large-scale datasets.

Distinguishing note: Focuses on the infrastructure layer for data performance rather than general data management.

Explore 28 awesome GitHub repositories matching data & databases · High-Performance Data Infrastructures. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

microsoft/qlib
microsoft/qlib
44,490عرض على GitHub
This project is a comprehensive platform for quantitative investment research, machine learning, and algorithmic trading. It provides an end-to-end environment for developing, testing, and executing financial strategies, supporting the entire lifecycle from data ingestion and feature engineering to model training and backtesting. The system is distinguished by its configuration-driven workflow orchestration, which allows researchers to automate complex pipelines and manage experiments through declarative files. It features a high-performance data infrastructure that utilizes custom binary for
Maximizes throughput for large-scale financial datasets while ensuring point-in-time data integrity.
Pythonalgorithmic-tradingauto-quantdeep-learning
عرض على GitHub44,490
alibaba/easyexcel
alibaba/easyexcel
33,703عرض على GitHub
EasyExcel is a Java processing library designed for reading and writing XLS, XLSX, and CSV files. It functions as a memory-efficient spreadsheet parser, an object-relational mapper that binds spreadsheet columns to Java class fields, and a stream-based exporter for handling high-volume data. The library distinguishes itself through a streaming model that processes large files row-by-row via listeners to prevent heap memory overflow. It also operates as a template engine, allowing the population of predefined spreadsheet files with dynamic data while preserving original layouts and styles. Br
Reduces memory consumption during high-speed parsing through optimized data manipulation.
Javaexceljavajxl
عرض على GitHub33,703
stackexchange/dapper
StackExchange/Dapper
18,320عرض على GitHub
Dapper is a high-performance micro-ORM and SQL object mapper for .NET. It functions as an ADO.NET extension library that adds data mapping capabilities directly to database connections, allowing SQL query results to be transformed into typed objects. The project prioritizes execution speed and low memory overhead by using intermediate language generation to map database columns to object properties. It further optimizes performance through the use of concurrent caching for mapping functions and literal value injection to improve database execution plans. The library covers a broad range of d
Prioritizes execution speed and low memory overhead using runtime IL generation for result parsing.
C#
عرض على GitHub18,320
facebook/fresco
facebook/fresco
17,149عرض على GitHub
Fresco is an Android image loading library and cache manager designed to fetch, decode, and display images from network or local sources. It functions as a rendering engine for animated image formats and a streaming system for progressive image loading. The library distinguishes itself through specialized memory management that utilizes off-heap allocation to reduce garbage collection overhead and prevent out-of-memory errors. It includes a dedicated rendering pipeline for animated GIFs and WebP files and supports progressive JPEG decoding to render low-resolution versions of images while the
Places image data in specialized memory regions to increase processing speed and prevent out-of-memory errors.
Kotlin
عرض على GitHub17,149
oxnr/awesome-bigdata
oxnr/awesome-bigdata
14,454عرض على GitHub
This project is a curated directory of software, frameworks, and educational resources designed for building, scaling, and maintaining distributed data processing and storage architectures. It serves as a comprehensive index for the distributed computing ecosystem, helping users identify the appropriate tools for managing large-scale information systems. The repository functions as a central hub for data engineering, offering categorized access to technologies that support batch and stream processing, machine learning, and interactive querying. By organizing these resources, it assists in the
Maintains scalable, high-performance storage systems for structured and unstructured data across cloud environments.
awesomeawesome-listbigdata
عرض على GitHub14,454
golang/groupcache
golang/groupcache
13,326عرض على GitHub
Groupcache is a distributed caching library designed to coordinate data retrieval and storage across a cluster of nodes. It functions as a peer-to-peer data store that uses consistent hashing to assign specific keys to canonical owners, ensuring that cached items remain predictable and accessible throughout the network. The system distinguishes itself through a request coalescing engine that merges concurrent requests for the same missing key into a single upstream fetch. This mechanism prevents redundant backend load by ensuring that only one process retrieves the required data while sharing
Caches frequently accessed items across multiple nodes to eliminate system bottlenecks and maintain fast response times.
Go
عرض على GitHub13,326
skypjack/entt
skypjack/entt
12,294عرض على GitHub
EnTT is a C++ library designed for data-oriented design and entity component system architecture. It provides a framework for managing game objects and simulation states by separating entity data from logic, allowing for the efficient organization and manipulation of large collections of related data objects. The library utilizes sparse sets to store entities and components in contiguous memory, which facilitates cache-friendly iteration and constant-time lookups. It employs template metaprogramming for compile-time type reflection and type-erasure techniques to provide a unified interface fo
Optimizes CPU cache usage and system throughput for large sets of related data objects.
C++architectural-patternscppcpp17
عرض على GitHub12,294
coil-kt/coil
coil-kt/coil
11,819عرض على GitHub
Coil is an image loading and caching pipeline designed for Android and Compose Multiplatform applications. It functions as a comprehensive loader, caching engine, and rendering utility that asynchronously fetches and displays images from network URLs, local storage, and multiplatform resource systems. The library distinguishes itself through a flexible fetcher-decoder pipeline and an interface-driven component registry, allowing for the integration of custom networking clients and decoders. It provides specialized support for rendering scalable vector graphics, animated formats such as GIF an
Reduces resource consumption through a combination of memory caching, disk storage, and downsampling.
Kotlinandroidandroidxcompose
عرض على GitHub11,819
microsoft/sql-server-samples
microsoft/sql-server-samples
11,122عرض على GitHub
This is a reference implementation library providing a collection of code samples, Transact-SQL scripts, and schemas for SQL Server, Azure SQL, and Azure Synapse. It focuses on providing standardized implementation patterns and reference code for building relational databases and cloud data warehouses. The library distinguishes itself by offering specialized guides and examples for deploying database instances within containerized environments and Azure cloud services. It includes specific reference databases and language extensions for integrating machine learning services and advanced analy
Implements memory-optimized processing techniques to increase transaction speed and reduce system response time.
عرض على GitHub11,122
dtm-labs/dtm
dtm-labs/dtm
10,881عرض على GitHub
dtm is a distributed transaction framework and polyglot transaction coordinator designed to maintain data consistency across microservices. It functions as a Saga orchestration engine and a two-phase message coordinator, ensuring that multi-service operations either succeed completely or roll back to a consistent state. The project distinguishes itself by supporting multiple consistency patterns, including Saga, TCC, XA, and outbox patterns, allowing users to select the appropriate model for their specific application requirements. It provides a polyglot integration layer via HTTP and gRPC, e
Processes thousands of requests per second by executing inventory checks and global transactions in memory.
Gocadencecsharpdatabase
عرض على GitHub10,881
magicalpanda/magicalrecord
magicalpanda/MagicalRecord
10,713عرض على GitHub
MagicalRecord is a data persistence library and wrapper for Core Data that implements the Active Record pattern. It maps database rows directly to object instances, allowing for the creation, update, and retrieval of records without writing manual query logic. The project functions as a mapping layer that synchronizes object properties with a managed object context. It utilizes generic-based type resolution and model-class querying to enable data fetching directly on model classes, which removes the need for a separate external manager and reduces repetitive fetch request boilerplate. The li
Enables high-speed data retrieval using simplified single-line requests.
Objective-C
عرض على GitHub10,713
nathanmarz/storm
nathanmarz/storm
8,772عرض على GitHub
Storm is a distributed stream processing framework and fault-tolerant compute engine designed for executing real-time continuous computations across a cluster of machines. It functions as a stateful stream processor and cluster topology manager, enabling the deployment and monitoring of distributed data flow configurations. The system ensures exactly-once semantics by utilizing transactional state management to guarantee that every message in a data stream is processed exactly one time. It further operates as a distributed RPC system, allowing for the integration of non-native languages throu
Implements memory-optimized processing by bypassing serialization for intra-process communication.
Java
عرض على GitHub8,772
path/fastimagecache
path/FastImageCache
8,068عرض على GitHub
FastImageCache is an iOS image caching library that provides a persistent disk-based image store. It utilizes a persistent bitmap cache to store images in uncompressed formats and incorporates an image pre-processing pipeline to optimize assets before they are committed to storage. The library optimizes rendering performance by using memory-mapped image tables for constant-time retrieval and byte-aligned data layouts to prevent memory copies. It organizes images of identical dimensions into shared tables and manages disk space through a least-recently-used cache eviction system. The project
Implements byte-aligned data layouts to prevent expensive memory copies and optimize rendering performance.
Objective-C
عرض على GitHub8,068
magicstack/asyncpg
MagicStack/asyncpg
7,953عرض على GitHub
asyncpg is an asynchronous database driver and binary protocol client for PostgreSQL. It provides a non-blocking interface for executing SQL statements, streaming result sets, and managing data transfer between an application and a PostgreSQL database. The driver implements the PostgreSQL binary protocol directly to facilitate efficient data transfer and type conversion. It includes a connection pool to maintain and reuse open database connections, reducing the latency associated with repeated handshakes. The project covers a broad range of database integration capabilities, including atomic
Implements cursor-based result streaming to retrieve large datasets while minimizing application memory overhead.
Pythonasync-programmingasync-pythonasyncio
عرض على GitHub7,953
zetbaitsu/compressor
zetbaitsu/Compressor
7,222عرض على GitHub
Compressor is an Android image compression library designed to reduce the file size and dimensions of images within mobile applications. It functions as a bitmap optimizer that adjusts image quality and formats to minimize storage footprints and improve network upload speeds. The library operates as an asynchronous image processor, utilizing background threads and reactive streams to compress high-resolution photos. This execution model prevents user interface freezes and maintains application responsiveness during heavy image manipulation tasks. The project covers a broad range of image opt
Uses optimized memory-mapped regions to decode high-resolution photos without triggering out-of-memory errors.
Kotlin
عرض على GitHub7,222
apache/pinot
apache/pinot
6,098عرض على GitHub
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Extracts data from structured response objects using helper methods to access rows, columns, and specific data types.
Java
عرض على GitHub6,098
xtaci/algorithms
xtaci/algorithms
5,454عرض على GitHub
This is a collection of classical algorithms and data structures implemented as a header-only C++ library. It provides a suite of tools for general algorithm implementation, including data structure management, graph theory analysis, and string processing. The library is distinguished by its specialized toolkits for cryptographic hashing and encoding, featuring implementations of MD5, SHA-1, and Base64. It also includes advanced capabilities for high-performance string processing via suffix trees and arrays, as well as computational number theory for primality testing and arbitrary-precision
Uses specialized filters like Bloom filters to optimize data membership lookups.
C++
عرض على GitHub5,454
zendframework/zendframework
zendframework/zendframework
5,441عرض على GitHub
Zend Framework عبارة عن مجموعة شاملة من المكونات المنفصلة لبناء تطبيقات ويب معيارية تعتمد على الأحداث. ينفذ بنية MVC لفصل منطق الأعمال عن واجهة المستخدم ويوفر نظاماً مهيكلاً لمعالجة الطلبات من خلال خط أنابيب برمجيات وسيطة (Middleware) متسلسل. يتميز المشروع بحاوية حقن التبعية (Dependency Injection) القائمة على المصنع لأتمتة إنشاء الكائنات وإدارة دورات حياة الفئات. كما يتضمن مجموعة أمان شاملة للتحقق من هويات المستخدمين وتقييد الوصول إلى الموارد باستخدام قوائم التحكم في الوصول (ACL) ومحولات التحكم في الوصول القائمة على الأدوار (RBAC). يغطي إطار العمل مجموعة واسعة من القدرات، بما في ذلك تجريد قاعدة البيانات عبر بوابات الجداول والصفوف، وتنفيذ استدعاء الإجراء عن بُعد (RPC) لـ SOAP وJSON-RPC، وإطار عمل تطبيق وحدة التحكم لواجهات سطر الأوامر. تشمل المساحة الإضافية تسلسل البيانات، والتحقق من صحة المدخلات، وإدارة الجلسة، وأدوات تسليم البريد الإلكتروني وتدويل المحتوى.
Employs techniques for fetching large datasets with minimal memory overhead to prevent crashes in constrained environments.
عرض على GitHub5,441
tensorflow/tpu
tensorflow/tpu
5,281عرض على GitHub
This repository provides a collection of reference implementations, toolkits, and orchestration tools for training and deploying large-scale AI models on Cloud TPU hardware. It serves as a framework for managing the lifecycle of accelerator clusters, including hardware orchestration and the provisioning of high-performance compute infrastructure for machine learning workloads. The project specifically enables the pre-training of foundation models, large language models, and complex reasoning architectures through distributed training toolkits and multi-host scaling recipes. It further provide
Connects optimized high-throughput disks to virtual machines for efficient training data access.
Jupyter Notebook
عرض على GitHub5,281
rkyv/rkyv
rkyv/rkyv
4,267عرض على GitHub
rkyv is a zero-copy deserialization framework for Rust that provides a binary serialization format for memory-mappable data archives. It allows complex data structures to be mapped to bytes and accessed directly from a buffer without allocating new memory or copying data. The project enables the serialization of polymorphic types and trait objects, maintaining their dynamic behavior and structure within the binary form. It utilizes relative-pointer addressing and byte-aligned structure packing to ensure data remains valid regardless of where it is loaded in memory. The framework covers high-
Enables high-throughput storage and retrieval of large datasets by eliminating traditional deserialization costs.
Rustrustserializationzero-copy
عرض على GitHub4,267

Awesome High-Performance Data Infrastructures GitHub Repositories

microsoft/qlib

alibaba/easyexcel

StackExchange/Dapper

facebook/fresco

oxnr/awesome-bigdata

golang/groupcache

skypjack/entt

coil-kt/coil

microsoft/sql-server-samples

dtm-labs/dtm

magicalpanda/MagicalRecord

nathanmarz/storm

path/FastImageCache

MagicStack/asyncpg

zetbaitsu/Compressor

apache/pinot

xtaci/algorithms

zendframework/zendframework

tensorflow/tpu

rkyv/rkyv

استكشف الوسوم الفرعية