What are the best Awesome Data Iterators GitHub Repositories?

Programming components that provide sequential access to elements within a large data collection during processing. Explore 21 awesome GitHub repositories matching data & databases · Data Iterators. Refine with filters or upvote what's useful. Top picks: kamranahmedse/developer-roadmap, deepfakes/faceswap, google/leveldb, immutable-js/immutable-js, huggingface/datasets, samber/lo, qax-os/excelize, rare-technologies/gensim, home-assistant/home-assistant.io, electronicarts/eastl.

Why is kamranahmedse/developer-roadmap a recommended Data Iterators GitHub Repositories repository?

Provides sequential access to elements within large data collections during processing.

Why is deepfakes/faceswap a recommended Data Iterators GitHub Repositories repository?

Serves as a base class for plugins to ingest and pass information through the extraction pipeline.

Why is google/leveldb a recommended Data Iterators GitHub Repositories repository?

Provides sequential iterators for traversing stored entries in forward or backward order.

Why is immutable-js/immutable-js a recommended Data Iterators GitHub Repositories repository?

Implements memory-efficient lazy iterators that defer data transformations until values are explicitly requested.

Why is huggingface/datasets a recommended Data Iterators GitHub Repositories repository?

Implements lazy, memory-efficient iterators to process large datasets on demand without loading them into physical memory.

Why is samber/lo a recommended Data Iterators GitHub Repositories repository?

Provides a comprehensive toolkit for memory-efficient, lazy data traversal and deferred computation of large or infinite sequences in Go.

Why is qax-os/excelize a recommended Data Iterators GitHub Repositories repository?

Emits data iteratively to maintain low memory usage during large-scale file processing.

Why is rare-technologies/gensim a recommended Data Iterators GitHub Repositories repository?

Implements data iterators to stream large text collections from disk, avoiding memory exhaustion.

Why is home-assistant/home-assistant.io a recommended Data Iterators GitHub Repositories repository?

Converts lazy sequences produced by filters into static lists to enable counting and sorting.

Why is electronicarts/eastl a recommended Data Iterators GitHub Repositories repository?

Provides standardized iterators for traversing diverse data collections without exposing underlying memory layouts.

21 repository-uri

Awesome GitHub RepositoriesData Iterators

Programming components that provide sequential access to elements within a large data collection during processing.

Explore 21 awesome GitHub repositories matching data & databases · Data Iterators. Refine with filters or upvote what's useful.

Găsește cele mai bune repo-uri cu AI.Vom căuta cele mai potrivite repository-uri folosind AI.

kamranahmedse/developer-roadmap
kamranahmedse/developer-roadmap
357,434Vezi pe GitHub
Developer Roadmap este o platformă condusă de comunitate care oferă căi de învățare structurate, bazate pe grafuri, pentru ingineria software. Servește drept repository cuprinzător de cunoștințe unde domeniile tehnice sunt organizate în secvențe vizuale pentru a ghida dobândirea competențelor profesionale și creșterea în carieră. Proiectul se distinge printr-un ecosistem colaborativ care permite utilizatorilor să contribuie cu roadmap-uri, să cureție cele mai bune practici din industrie și să mențină profiluri profesionale. Acesta integrează framework-uri de evaluare diagnostică pentru a evalua competența tehnică, ajutând dezvoltatorii să identifice lacunele de cunoștințe și să se pregătească pentru interviurile profesionale prin secvențe de învățare țintite. Dincolo de capabilitățile sale de bază de mapare, platforma oferă idei practice de proiecte și tutorat interactiv pentru a consolida conceptele de inginerie. Oferă un spațiu centralizat pentru ca comunitatea să partajeze resurse, să urmărească dezvoltarea progresivă a competențelor și să navigheze prin peisaje tehnice complexe.
Provides sequential access to elements within large data collections during processing.
TypeScriptangular-roadmapbackend-roadmapblockchain-roadmap
Vezi pe GitHub357,434
deepfakes/faceswap
deepfakes/faceswap
55,289Vezi pe GitHub
Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames. The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated process
Serves as a base class for plugins to ingest and pass information through the extraction pipeline.
Pythondeep-face-swapdeep-learningdeep-neural-networks
Vezi pe GitHub55,289
google/leveldb
google/leveldb
39,152Vezi pe GitHub
LevelDB is an embedded database library and persistent storage engine that provides a sorted key-value store. It uses a log-structured merge-tree architecture to map byte arrays to values, running directly within a process to provide storage without the need for a separate server process. The system is distinguished by its use of custom comparison functions to define key ordering, enabling efficient range scans and sequenced lookups. It ensures data reliability through atomic batch execution, consistent snapshot generation, and log-based recovery after failures. The engine covers broad capab
Provides sequential iterators for traversing stored entries in forward or backward order.
C++
Vezi pe GitHub39,152
immutable-js/immutable-js
immutable-js/immutable-js
33,060Vezi pe GitHub
Immutable.js is a library of persistent data structures and a functional state management toolkit. It provides a collection of immutable objects and arrays that prevent direct mutation to ensure predictable state management in JavaScript applications. The library utilizes structural sharing to efficiently create new versions of data without full copying and implements lazy sequence processing to chain data transformations that execute only when values are requested. It also supports batch mutation processing, allowing multiple changes to be applied to a temporary mutable copy before returning
Implements memory-efficient lazy iterators that defer data transformations until values are explicitly requested.
TypeScript
Vezi pe GitHub33,060
huggingface/datasets
huggingface/datasets
21,643Vezi pe GitHub
Datasets is a library designed for the management, processing, and sharing of large-scale data collections for machine learning workflows. It functions as both a data processing framework and a versioning platform, providing tools to organize, filter, and transform massive datasets while ensuring reproducibility across research and development teams. The library distinguishes itself by enabling the handling of datasets that exceed available system memory. It utilizes memory-mapped file access, disk-based caching, and lazy iterative streaming to maintain performance when working with large-sca
Implements lazy, memory-efficient iterators to process large datasets on demand without loading them into physical memory.
Pythonaiartificial-intelligencecomputer-vision
Vezi pe GitHub21,643
samber/lo
samber/lo
21,333Vezi pe GitHub
This library is a collection of generic utilities for the Go programming language designed to simplify the manipulation of slices and maps. It provides a functional toolkit that enables developers to perform data transformations, such as filtering, mapping, and reducing, while maintaining strict type safety through the use of language-level generics. The project distinguishes itself by offering a dual approach to data processing that balances functional programming patterns with performance-oriented execution. It supports both immutable functional pipelines for predictable state transitions a
Provides a comprehensive toolkit for memory-efficient, lazy data traversal and deferred computation of large or infinite sequences in Go.
Goconstraintscontractfilterable
Vezi pe GitHub21,333
qax-os/excelize
qax-os/excelize
20,682Vezi pe GitHub
Excelize is a library for reading and writing spreadsheet files in the Office Open XML format. It provides a comprehensive suite of tools for programmatically creating, modifying, and analyzing workbooks, worksheets, and cell data, ensuring compatibility across various office software suites through structured XML serialization. The library distinguishes itself with a built-in formula calculation engine that evaluates complex mathematical and logical expressions directly against workbook data. It also features a memory-mapped streaming architecture, which allows for the efficient processing o
Emits data iteratively to maintain low memory usage during large-scale file processing.
Goagentaianalytics
Vezi pe GitHub20,682
rare-technologies/gensim
RaRe-Technologies/gensim
16,442Vezi pe GitHub
Gensim is an unsupervised natural language processing toolkit designed for topic modeling, word embedding training, and the processing of large-scale text corpora. It provides a framework for discovering latent themes and semantic structures in text without the need for labeled data. The toolkit is distinguished by its ability to handle datasets that exceed system memory through iterator-based data streaming from disk. It also supports distributed model training, allowing complex modeling tasks to be executed across computer clusters. The library covers a broad range of analysis capabilities
Implements data iterators to stream large text collections from disk, avoiding memory exhaustion.
Python
Vezi pe GitHub16,442
home-assistant/home-assistant.io
home-assistant/home-assistant.io
9,466Vezi pe GitHub
Home Assistant is a local home automation platform and server that acts as an IoT device orchestrator. It integrates diverse smart home hardware by wrapping third-party APIs into a standardized logic layer and stores all system state and historical statistics on local hardware to eliminate cloud dependencies. The system functions as a Matter IoT controller and an MQTT home automation bridge, allowing for local interoperability between different manufacturers. It features a state-based entity model and an internal event bus that decouple physical device logic from system automation. The platf
Converts lazy sequences produced by filters into static lists to enable counting and sorting.
HTMLdocumentationhacktoberfesthass
Vezi pe GitHub9,466
electronicarts/eastl
electronicarts/EASTL
9,273Vezi pe GitHub
EASTL is a C++ Standard Template Library implementation consisting of containers, iterators, and algorithms. It provides cross-platform data structures and a template-based algorithm library designed for use in resource-constrained game engine environments. The library focuses on game engine memory management, providing specialized utilities that ensure predictable memory allocation and high-performance access for real-time applications. These containers maintain consistent behavior across different operating systems and hardware platforms. The project covers high-performance C++ development
Provides standardized iterators for traversing diverse data collections without exposing underlying memory layouts.
C++c-plus-plusc-plus-plus-11c-plus-plus-14
Vezi pe GitHub9,273
iamseancheney/python_for_data_analysis_2nd_chinese_version
iamseancheney/python_for_data_analysis_2nd_chinese_version
8,937Vezi pe GitHub
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
Uses generators to produce sequences of values on demand, reducing memory consumption for large datasets.
matplotlibnumpypandas
Vezi pe GitHub8,937
nodejs/nodejs.org
nodejs/nodejs.org
6,842Vezi pe GitHub
Node.js is an open-source, cross-platform JavaScript runtime environment built on the V8 engine, designed for executing JavaScript code outside a web browser. It operates as a server-side JavaScript platform with an event-driven, non-blocking I/O architecture that enables building scalable network applications and web servers. The runtime integrates the CommonJS module system for synchronous module loading and the npm ecosystem for sharing and reusing packages. The platform provides comprehensive capabilities for web server development, including creating HTTP and HTTPS servers, managing HTTP
Supports processing streaming data with async iterators for chunk-by-chunk consumption without full buffering.
TypeScriptnextjsnodenodejs
Vezi pe GitHub6,842
dtao/lazy.js
dtao/lazy.js
5,975Vezi pe GitHub
Lazy.js is a JavaScript library that implements a lazy evaluation model for processing collections and data streams. It defers all computation until iteration begins, building chains of transformations that execute only when values are consumed, avoiding intermediate arrays and buffering. The library wraps data sources into a uniform sequence interface, enabling operations like map and filter to be chained together without materializing intermediate results. The library extends lazy processing beyond simple collections to handle asynchronous data sources, DOM events, strings, and Node.js stre
Integrates with asynchronous data sources by yielding values at timed intervals or from streams without blocking.
JavaScript
Vezi pe GitHub5,975
hadley/r4ds
hadley/r4ds
5,070Vezi pe GitHub
r4ds este un curriculum de știința datelor și o resursă educațională concepută pentru stăpânirea limbajului de programare R. Oferă o cale de învățare structurată pentru procesul end-to-end de importare, curățare, transformare și vizualizare a datelor. Proiectul pune accent pe un ghid de știința datelor reproductibil și un curriculum cuprinzător pentru manipularea datelor (data wrangling). Include tutoriale specializate despre gramatica graficelor pentru vizualizarea stratificată a datelor și publicații tehnice create cu Quarto care îmbină codul executabil cu proza narativă. Materialul acoperă o gamă largă de capabilități analitice, inclusiv ingestia de date din surse diverse, unirea datelor relaționale și gestionarea variabilelor categorice. De asemenea, abordează curățarea datelor, modelarea matematică și generarea de rapoarte și prezentări profesionale în formate multiple. Curriculum-ul se concentrează pe aplicarea practică a programării funcționale și a principiilor „tidy data” pentru a crea analize transparente și repetabile.
Demonstrates how to apply a consistent set of actions across data collections using functional programming.
R
Vezi pe GitHub5,070
pytoolz/toolz
pytoolz/toolz
5,117Vezi pe GitHub
Toolz is a Python library that implements functional programming utilities for iterable transformation, dictionary manipulation, function composition, and lazy evaluation. It provides a set of pure functions designed to work with Python's built-in data structures, enabling concise and composable data processing workflows. What distinguishes toolz is its support for curried partial application, allowing functions to be incrementally applied and reused. It includes dictionary-centric operations that handle nested structures, and offers iterable chain transformers that combine mapping, filtering
Processes sequences on-demand using generators for memory-efficient handling of large data streams.
Python
Vezi pe GitHub5,117
gajus/slonik
gajus/slonik
4,910Vezi pe GitHub
Slonik este un client PostgreSQL type-safe pentru Node.js care utilizează tagged template literals pentru a asigura că parametrii sunt legați și protejați împotriva atacurilor de tip injecție. Oferă un framework pentru conectarea aplicațiilor la PostgreSQL cu verificare automată a tipurilor pentru interogări și scheme de baze de date. Proiectul se distinge printr-un linter de interogări SQL specializat care detectează coloanele invalide și nepotrivirile de tip prin verificarea codului față de o schemă de bază de date live în timpul procesului de dezvoltare. Include, de asemenea, un inserator binar de date în masă de înaltă performanță pentru încărcarea seturilor mari de date folosind serializarea binară nativă și un manager de pool de conexiuni capabil de rutare dinamică a interogărilor între nodurile primare și cele secundare. Biblioteca acoperă un set larg de capabilități de baze de date, inclusiv gestionarea tranzacțiilor atomice, construirea dinamică a interogărilor SQL și procesarea seturilor mari de rezultate prin streaming async-iterable. Oferă în continuare interceptori de middleware pentru logare și benchmarking, parsarea tipurilor personalizate și mecanisme de callback asincrone pentru reîmprospătarea credențialelor de autentificare la baza de date.
Provides memory-efficient processing of large database result sets using async iterable streams.
TypeScript
Vezi pe GitHub4,910
pytorch/ignite
pytorch/ignite
4,770Vezi pe GitHub
Ignite este un framework de antrenament de nivel înalt pentru rețele neuronale PyTorch, care servește drept motor de antrenament și manager al ciclului de viață al deep learning-ului. Oferă un sistem structurat pentru organizarea și automatizarea buclelor de antrenament și evaluare, gestionând iteratoarele de date și declanșând handler-e de evenimente la etape specifice în timpul procesului de antrenare a modelului. Proiectul se distinge printr-o suită cuprinzătoare de instrumente pentru antrenament distribuit și evaluarea modelelor. Include utilitare pentru sincronizarea gradienților și coordonarea comunicării colective între mai multe GPU-uri sau noduri, precum și o suită de evaluare pentru calcularea metricilor de performanță și efectuarea validării încrucișate (k-fold cross-validation). Capabilitățile sale mai largi acoperă automatizarea fluxului de lucru de antrenament, inclusiv programarea ratei de învățare, oprirea timpurie (early stopping) și optimizarea hiperparametrilor. Framework-ul oferă, de asemenea, instrumente de observabilitate pentru urmărirea experimentelor, profilarea timpului de execuție și antrenamentul cu precizie mixtă pentru a optimiza utilizarea memoriei. Sunt incluse mecanisme de persistență a stării pentru a gestiona checkpoint-urile modelelor și a recupera sesiunile de antrenament. Sunt disponibile medii containerizate pentru a simplifica implementarea și configurarea mediului.
Controls finite or infinite data streams by determining epoch lengths or restarting exhausted iterators.
Python
Vezi pe GitHub4,770
stripe/stripe-node
stripe/stripe-node
4,442Vezi pe GitHub
This is a typed server-side library and payment gateway SDK for integrating Stripe into Node.js applications. It provides a typed client to manage payments, customers, and subscriptions, while offering specialized tools for executing secure financial transactions and managing billing resources. The library distinguishes itself through an idempotent API client that prevents duplicate operations using idempotency keys and exponential backoff retry logic. It includes a webhook signature validator to verify that incoming HTTPS event notifications are authentic and an async-iterator pagination wra
Uses JavaScript async iterators to stream paginated data from the API without buffering the entire payload.
TypeScript
Vezi pe GitHub4,442
xtensor-stack/xtensor
xtensor-stack/xtensor
3,748Vezi pe GitHub
xtensor is a C++ multidimensional array library for numerical computing that provides N-dimensional containers with an interface mirroring the NumPy API. It utilizes a lazy evaluation expression engine to defer numerical computations until assignment, which minimizes memory allocations and intermediate copies. The library features a foreign memory array adaptor that allows it to wrap external buffers, such as NumPy arrays, to perform numerical operations in-place without duplicating data. It further optimizes performance through lazy broadcasting and a system that manages the lifetime of temp
Provides memory-efficient, STL-compatible forward and reverse iterators to process tensor data.
C++c-plus-plus-14multidimensional-arraysnumpy
Vezi pe GitHub3,748
nvidia/cuda-python
NVIDIA/cuda-python
3,170Vezi pe GitHub
cuda-python provides low-level Python bindings for the CUDA Driver and Runtime APIs. It serves as a programmatic wrapper for controlling device memory, managing hardware toolchains, and orchestrating execution graphs on NVIDIA GPUs, allowing for the compilation and launching of parallel kernels directly from Python. The project enables the development of SIMT kernels and the execution of mathematical algorithms on device memory. It integrates pre-compiled bytecode as custom operators and interfaces with accelerated device libraries to access low-level hardware functions without leaving the la
Uses iterators to compute sequence elements on demand, minimizing the allocation of large intermediate arrays.
Cython
Vezi pe GitHub3,170