21 repositorios
Programming components that provide sequential access to elements within a large data collection during processing.
Explore 21 awesome GitHub repositories matching data & databases · Data Iterators. Refine with filters or upvote what's useful.
Developer Roadmap es una plataforma impulsada por la comunidad que proporciona rutas de aprendizaje estructuradas basadas en grafos para la ingeniería de software. Sirve como un repositorio de conocimiento integral donde los dominios técnicos se organizan en secuencias visuales para guiar la adquisición de habilidades profesionales y el crecimiento profesional. El proyecto se distingue por un ecosistema colaborativo que permite a los usuarios contribuir con roadmaps, curar las mejores prácticas de la industria y mantener perfiles profesionales. Integra marcos de evaluación de diagnóstico para evaluar la competencia técnica, ayudando a los desarrolladores a identificar brechas de conocimiento y prepararse para entrevistas profesionales a través de secuencias de aprendizaje específicas. Más allá de sus capacidades principales de mapeo, la plataforma ofrece ideas de proyectos prácticos y tutoría interactiva para reforzar los conceptos de ingeniería. Proporciona un espacio centralizado para que la comunidad comparta recursos, rastree el desarrollo progresivo de habilidades y navegue por paisajes técnicos complejos.
Provides sequential access to elements within large data collections during processing.
Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames. The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated process
Serves as a base class for plugins to ingest and pass information through the extraction pipeline.
LevelDB is an embedded database library and persistent storage engine that provides a sorted key-value store. It uses a log-structured merge-tree architecture to map byte arrays to values, running directly within a process to provide storage without the need for a separate server process. The system is distinguished by its use of custom comparison functions to define key ordering, enabling efficient range scans and sequenced lookups. It ensures data reliability through atomic batch execution, consistent snapshot generation, and log-based recovery after failures. The engine covers broad capab
Provides sequential iterators for traversing stored entries in forward or backward order.
Immutable.js is a library of persistent data structures and a functional state management toolkit. It provides a collection of immutable objects and arrays that prevent direct mutation to ensure predictable state management in JavaScript applications. The library utilizes structural sharing to efficiently create new versions of data without full copying and implements lazy sequence processing to chain data transformations that execute only when values are requested. It also supports batch mutation processing, allowing multiple changes to be applied to a temporary mutable copy before returning
Implements memory-efficient lazy iterators that defer data transformations until values are explicitly requested.
Datasets is a library designed for the management, processing, and sharing of large-scale data collections for machine learning workflows. It functions as both a data processing framework and a versioning platform, providing tools to organize, filter, and transform massive datasets while ensuring reproducibility across research and development teams. The library distinguishes itself by enabling the handling of datasets that exceed available system memory. It utilizes memory-mapped file access, disk-based caching, and lazy iterative streaming to maintain performance when working with large-sca
Implements lazy, memory-efficient iterators to process large datasets on demand without loading them into physical memory.
This library is a collection of generic utilities for the Go programming language designed to simplify the manipulation of slices and maps. It provides a functional toolkit that enables developers to perform data transformations, such as filtering, mapping, and reducing, while maintaining strict type safety through the use of language-level generics. The project distinguishes itself by offering a dual approach to data processing that balances functional programming patterns with performance-oriented execution. It supports both immutable functional pipelines for predictable state transitions a
Provides a comprehensive toolkit for memory-efficient, lazy data traversal and deferred computation of large or infinite sequences in Go.
Excelize is a library for reading and writing spreadsheet files in the Office Open XML format. It provides a comprehensive suite of tools for programmatically creating, modifying, and analyzing workbooks, worksheets, and cell data, ensuring compatibility across various office software suites through structured XML serialization. The library distinguishes itself with a built-in formula calculation engine that evaluates complex mathematical and logical expressions directly against workbook data. It also features a memory-mapped streaming architecture, which allows for the efficient processing o
Emits data iteratively to maintain low memory usage during large-scale file processing.
Gensim is an unsupervised natural language processing toolkit designed for topic modeling, word embedding training, and the processing of large-scale text corpora. It provides a framework for discovering latent themes and semantic structures in text without the need for labeled data. The toolkit is distinguished by its ability to handle datasets that exceed system memory through iterator-based data streaming from disk. It also supports distributed model training, allowing complex modeling tasks to be executed across computer clusters. The library covers a broad range of analysis capabilities
Implements data iterators to stream large text collections from disk, avoiding memory exhaustion.
Home Assistant is a local home automation platform and server that acts as an IoT device orchestrator. It integrates diverse smart home hardware by wrapping third-party APIs into a standardized logic layer and stores all system state and historical statistics on local hardware to eliminate cloud dependencies. The system functions as a Matter IoT controller and an MQTT home automation bridge, allowing for local interoperability between different manufacturers. It features a state-based entity model and an internal event bus that decouple physical device logic from system automation. The platf
Converts lazy sequences produced by filters into static lists to enable counting and sorting.
EASTL is a C++ Standard Template Library implementation consisting of containers, iterators, and algorithms. It provides cross-platform data structures and a template-based algorithm library designed for use in resource-constrained game engine environments. The library focuses on game engine memory management, providing specialized utilities that ensure predictable memory allocation and high-performance access for real-time applications. These containers maintain consistent behavior across different operating systems and hardware platforms. The project covers high-performance C++ development
Provides standardized iterators for traversing diverse data collections without exposing underlying memory layouts.
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
Uses generators to produce sequences of values on demand, reducing memory consumption for large datasets.
Node.js is an open-source, cross-platform JavaScript runtime environment built on the V8 engine, designed for executing JavaScript code outside a web browser. It operates as a server-side JavaScript platform with an event-driven, non-blocking I/O architecture that enables building scalable network applications and web servers. The runtime integrates the CommonJS module system for synchronous module loading and the npm ecosystem for sharing and reusing packages. The platform provides comprehensive capabilities for web server development, including creating HTTP and HTTPS servers, managing HTTP
Supports processing streaming data with async iterators for chunk-by-chunk consumption without full buffering.
Lazy.js is a JavaScript library that implements a lazy evaluation model for processing collections and data streams. It defers all computation until iteration begins, building chains of transformations that execute only when values are consumed, avoiding intermediate arrays and buffering. The library wraps data sources into a uniform sequence interface, enabling operations like map and filter to be chained together without materializing intermediate results. The library extends lazy processing beyond simple collections to handle asynchronous data sources, DOM events, strings, and Node.js stre
Integrates with asynchronous data sources by yielding values at timed intervals or from streams without blocking.
r4ds es un currículo de ciencia de datos y recurso educativo diseñado para dominar el lenguaje de programación R. Proporciona una ruta de aprendizaje estructurada para el proceso de extremo a extremo de importar, limpiar, transformar y visualizar datos. El proyecto enfatiza una guía de ciencia de datos reproducible y un currículo integral para la manipulación de datos (data wrangling). Incluye tutoriales especializados sobre la gramática de gráficos para la visualización de datos en capas y publicaciones técnicas creadas con Quarto que combinan código ejecutable con prosa narrativa. El material cubre una amplia gama de capacidades analíticas, incluyendo la ingesta de datos de diversas fuentes, unión de datos relacionales y la gestión de variables categóricas. También aborda la limpieza de datos, modelado matemático y la generación de informes y presentaciones profesionales en múltiples formatos. El currículo se centra en la aplicación práctica de la programación funcional y los principios de datos ordenados (tidy data) para crear análisis transparentes y repetibles.
Demonstrates how to apply a consistent set of actions across data collections using functional programming.
Toolz is a Python library that implements functional programming utilities for iterable transformation, dictionary manipulation, function composition, and lazy evaluation. It provides a set of pure functions designed to work with Python's built-in data structures, enabling concise and composable data processing workflows. What distinguishes toolz is its support for curried partial application, allowing functions to be incrementally applied and reused. It includes dictionary-centric operations that handle nested structures, and offers iterable chain transformers that combine mapping, filtering
Processes sequences on-demand using generators for memory-efficient handling of large data streams.
Slonik es un cliente de PostgreSQL con seguridad de tipos para Node.js que utiliza tagged template literals para asegurar que los parámetros estén vinculados y protegidos contra ataques de inyección. Proporciona un framework para conectar aplicaciones a PostgreSQL con verificación de tipos automática para consultas y esquemas de base de datos. El proyecto se distingue por un linter de consultas SQL especializado que detecta columnas inválidas y discrepancias de tipos verificando el código contra un esquema de base de datos en vivo durante el proceso de desarrollo. También incluye un insertador de datos masivos binario de alto rendimiento para cargar grandes datasets usando serialización binaria nativa y un gestor de pool de conexiones capaz de enrutamiento dinámico de consultas entre nodos primarios y réplicas. La librería cubre un amplio conjunto de capacidades de base de datos, incluyendo gestión de transacciones atómicas, construcción dinámica de consultas SQL y procesamiento de grandes conjuntos de resultados mediante streaming async-iterable. Además, proporciona interceptores de middleware para logging y benchmarking, parseo de tipos personalizado y mecanismos de callback asíncronos para refrescar credenciales de autenticación de base de datos.
Provides memory-efficient processing of large database result sets using async iterable streams.
Ignite es un framework de entrenamiento de alto nivel para redes neuronales en PyTorch que sirve como motor de entrenamiento y gestor del ciclo de vida del aprendizaje profundo. Proporciona un sistema estructurado para organizar y automatizar bucles de entrenamiento y evaluación, gestionando iteradores de datos y activando manejadores de eventos en hitos específicos durante el proceso de entrenamiento del modelo. El proyecto se distingue por una suite integral de herramientas para el entrenamiento distribuido y la evaluación de modelos. Incluye utilidades para sincronizar gradientes y coordinar la comunicación colectiva a través de múltiples GPUs o nodos, así como una suite de evaluación para calcular métricas de rendimiento y realizar validación cruzada k-fold. Sus capacidades más amplias cubren la automatización del flujo de trabajo de entrenamiento, incluyendo la programación de la tasa de aprendizaje, parada temprana y optimización de hiperparámetros. El framework también proporciona herramientas de observabilidad para el seguimiento de experimentos, perfilado de tiempo de ejecución y entrenamiento de precisión mixta para optimizar el uso de memoria. Se incluyen mecanismos de persistencia de estado para gestionar checkpoints del modelo y recuperar sesiones de entrenamiento. Hay entornos contenedorizados disponibles para simplificar el despliegue y la configuración del entorno.
Controls finite or infinite data streams by determining epoch lengths or restarting exhausted iterators.
Esta es una biblioteca tipada del lado del servidor y un SDK de pasarela de pago para integrar Stripe en aplicaciones Node.js. Proporciona un cliente tipado para gestionar pagos, clientes y suscripciones, ofreciendo herramientas especializadas para ejecutar transacciones financieras seguras y gestionar recursos de facturación. La biblioteca se distingue por un cliente de API idempotente que evita operaciones duplicadas utilizando claves de idempotencia y lógica de reintento con retroceso exponencial. Incluye un validador de firma de webhook para verificar que las notificaciones de eventos HTTPS entrantes sean auténticas y un envoltorio de paginación de iterador asíncrono para recorrer grandes conjuntos de datos. El proyecto cubre una amplia gama de capacidades, incluyendo la gestión de facturación de suscripciones, la orquestación de plataformas de pago para cuentas conectadas y la búsqueda de recursos. Proporciona un manejo integral de respuestas mediante la expansión de objetos y la selección de campos, junto con características de seguridad para la autenticación de peticiones de API y la verificación de webhooks. La biblioteca está escrita en TypeScript.
Uses JavaScript async iterators to stream paginated data from the API without buffering the entire payload.
xtensor is a C++ multidimensional array library for numerical computing that provides N-dimensional containers with an interface mirroring the NumPy API. It utilizes a lazy evaluation expression engine to defer numerical computations until assignment, which minimizes memory allocations and intermediate copies. The library features a foreign memory array adaptor that allows it to wrap external buffers, such as NumPy arrays, to perform numerical operations in-place without duplicating data. It further optimizes performance through lazy broadcasting and a system that manages the lifetime of temp
Provides memory-efficient, STL-compatible forward and reverse iterators to process tensor data.
cuda-python provides low-level Python bindings for the CUDA Driver and Runtime APIs. It serves as a programmatic wrapper for controlling device memory, managing hardware toolchains, and orchestrating execution graphs on NVIDIA GPUs, allowing for the compilation and launching of parallel kernels directly from Python. The project enables the development of SIMT kernels and the execution of mathematical algorithms on device memory. It integrates pre-compiled bytecode as custom operators and interfaces with accelerated device libraries to access low-level hardware functions without leaving the la
Uses iterators to compute sequence elements on demand, minimizing the allocation of large intermediate arrays.