6 dépôts
Systems designed to handle high-volume data operations by grouping multiple requests into single, efficient execution cycles.
Distinguishing note: Focuses on bulk data manipulation performance rather than standard CRUD operations.
Explore 6 awesome GitHub repositories matching data & databases · Batch Processing Engines. Refine with filters or upvote what's useful.
Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames. The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated process
Segments media into manageable chunks to sustain high-throughput processing during intensive neural network operations.
Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments. The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
Orchestrates batch workflows defined as code with centralized monitoring.
Feast is an open-source feature store for machine learning that provides a central platform for defining, storing, and serving features across both training and inference workflows. It operates as a declarative system where feature definitions are written as code in Python files, synchronized to a central registry, and made available for low-latency online retrieval or point-in-time correct historical joins for training datasets. The project abstracts storage behind a pluggable architecture, allowing offline and online backends to be swapped without changing retrieval logic, and coordinates ma
Replaces the default materialization engine with a user-defined class for batch ingestion, historical retrieval, and infrastructure lifecycle.
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
Enables the definition of custom logic for handling specialized data types, including metadata extraction and visualization generation for dashboard display.
AiNiee is an LLM-based localization tool that automates the translation of games, books, subtitles, and documents across multiple languages. It operates as a batch processing engine, translating entire folders of files in parallel while preserving directory structure, and includes a glossary management system that enforces terminology consistency using AI-powered glossaries, forbidden terms, and user-defined text substitution rules. The tool differentiates itself through key architectural decisions: it distributes translation requests across multiple API keys to bypass rate limits and acceler
Processes entire folders of files in parallel, pooling API keys for high-throughput batch translation.
YT-Spammer-Purge est un outil de gestion et de modération de contenu via l'API YouTube, conçu pour détecter et supprimer les spams et les commentaires frauduleux sur les vidéos et les chaînes. Il fonctionne comme un utilitaire de filtrage et de gestion spécialisé pour identifier les contenus problématiques et bannir les utilisateurs indésirables. L'outil utilise un filtrage de contenu automatisé et une détection de texte basée sur des motifs pour signaler les commentaires indésirables via des expressions régulières et une analyse au niveau des caractères. Il prend en charge un processus de révision en deux étapes, où les spams suspects sont placés dans une file d'attente pour une approbation manuelle avant suppression définitive. Le système inclut des capacités administratives pour le traitement par lots des commentaires sur des comptes entiers, l'audit des actions de modération via des journaux détaillés et la restriction des permissions des modérateurs pour éviter tout abus administratif.
Implements a batch processing engine to apply bulk removal and banning operations across multiple videos or channels.