6 个仓库
Systems designed to handle high-volume data operations by grouping multiple requests into single, efficient execution cycles.
Distinguishing note: Focuses on bulk data manipulation performance rather than standard CRUD operations.
Explore 6 awesome GitHub repositories matching data & databases · Batch Processing Engines. Refine with filters or upvote what's useful.
Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames. The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated process
Segments media into manageable chunks to sustain high-throughput processing during intensive neural network operations.
Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments. The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
Orchestrates batch workflows defined as code with centralized monitoring.
Feast is an open-source feature store for machine learning that provides a central platform for defining, storing, and serving features across both training and inference workflows. It operates as a declarative system where feature definitions are written as code in Python files, synchronized to a central registry, and made available for low-latency online retrieval or point-in-time correct historical joins for training datasets. The project abstracts storage behind a pluggable architecture, allowing offline and online backends to be swapped without changing retrieval logic, and coordinates ma
Replaces the default materialization engine with a user-defined class for batch ingestion, historical retrieval, and infrastructure lifecycle.
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
Enables the definition of custom logic for handling specialized data types, including metadata extraction and visualization generation for dashboard display.
AiNiee is an LLM-based localization tool that automates the translation of games, books, subtitles, and documents across multiple languages. It operates as a batch processing engine, translating entire folders of files in parallel while preserving directory structure, and includes a glossary management system that enforces terminology consistency using AI-powered glossaries, forbidden terms, and user-defined text substitution rules. The tool differentiates itself through key architectural decisions: it distributes translation requests across multiple API keys to bypass rate limits and acceler
Processes entire folders of files in parallel, pooling API keys for high-throughput batch translation.
YT-Spammer-Purge 是一个 YouTube API 内容管理和审核工具,旨在检测并删除视频和频道中的垃圾信息和诈骗评论。它作为一种专门的过滤器和管理工具,用于识别问题内容并封禁违规用户。 该工具采用自动内容过滤和基于模式的文本检测,使用正则表达式和字符级分析来标记垃圾评论。它支持两阶段审核流程,可疑垃圾评论会被缓冲到审核队列中,经人工批准后方可永久删除。 该系统包含管理功能,可批量处理整个账户的评论,通过详细日志审计审核操作,并限制版主权限以防止管理滥用。
Implements a batch processing engine to apply bulk removal and banning operations across multiple videos or channels.