12 个仓库
Systems for offloading and distributing computational tasks across multiple nodes or external services.
Distinguishing note: Focuses on the distribution of processing tasks.
Explore 12 awesome GitHub repositories matching devops & infrastructure · Distributed Processing. Refine with filters or upvote what's useful.
Open WebUI is a self-hosted, web-based platform designed for interacting with local and remote artificial intelligence models. It functions as a unified interface and orchestration suite, enabling users to build, deploy, and manage specialized AI agents equipped with custom instructions, external tool access, and private knowledge bases. The platform distinguishes itself through a modular architecture that supports complex AI workflows. It features a plugin-based framework for custom logic and pipeline-based request processing, allowing developers to filter or transform data streams before th
Allows offloading processing tasks to external machines for distributed setups.
Airflow is a workflow orchestration platform for authoring, scheduling, and monitoring complex data pipelines as code using Python. It employs a DAG-based task scheduler to manage execution timing and dependencies via directed acyclic graphs, utilizing a distributed task execution engine to run workloads across a cluster of worker nodes. The platform provides a data pipeline monitor for tracking the health and execution history of programmatic workflows. This includes a web interface for workflow progress visualization and health monitoring to identify and troubleshoot pipeline failures. The
Offloads and distributes heavy computational workloads across a cluster of worker nodes for parallel processing.
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
Implements barriers in multi-process training to synchronize execution points across distributed sub-processes.
Bull is a Node.js library for managing distributed jobs and message queues using Redis as the primary data store. It functions as a distributed task worker, job scheduler, and priority queue manager designed to handle asynchronous workloads across multiple processes. The project distinguishes itself by providing a persistent communication channel that decouples servers through the exchange of serializable data objects. It ensures distributed system reliability by detecting stalled tasks and recovering from process crashes to ensure every queued job is completed. The system covers a broad ran
Distributes asynchronous task processing across multiple Node.js worker processes using a shared Redis backend.
ImageMagick is a comprehensive software suite for the creation, editing, composition, and conversion of digital images. It functions as both a command-line utility for batch processing and automation, and as a programming library that allows developers to integrate advanced image manipulation capabilities into external applications. The project is distinguished by its modular architecture, which supports hundreds of image formats through a pluggable coder system and external delegate libraries. It is designed for high-performance environments, utilizing memory-mapped pixel caching, stream-ori
Offloads pixel cache operations to remote servers to support large-scale image processing across networked machines.
Nightingale is a Prometheus-compatible monitoring and alerting platform designed to centralize telemetry management across multiple time-series databases. It functions as a multi-source alerting engine and metric data pipeline that ingests telemetry via remote write protocols and triggers alarms based on data from sources such as Prometheus, Elasticsearch, Loki, and ClickHouse. The system is distinguished by its automated alert healing system, which executes predefined scripts and RPC-based corrective actions when monitoring thresholds are breached. It supports distributed alert processing, a
Spreads alert evaluation tasks across multiple processing nodes to balance load and provide automatic failover.
Meshroom is a node-based photogrammetry software designed to transform collections of two-dimensional images into three-dimensional models and scene geometry. It provides a visual interface for constructing and managing modular data pipelines, allowing users to automate complex computer vision tasks such as feature extraction, depth map estimation, and mesh generation. The software distinguishes itself through a distributed computational framework that dispatches resource-intensive tasks across local hardware or remote render farms. By utilizing a directed acyclic graph execution model, it en
Dispatches and manages heavy reconstruction tasks across local hardware or remote render farms to optimize execution speed.
Synapse is a decentralized communication server implementation that enables real-time messaging and data exchange across the global Matrix federation. It functions as a homeserver, allowing operators to host their own nodes while maintaining control over personal data and user identity within a distributed network. The server utilizes a federated messaging protocol to exchange messages and user data with independent servers, ensuring consistent state across the network. To support high-traffic environments, it employs a distributed service architecture that offloads tasks to independent backg
Distributes server workloads across independent background processes to enable horizontal scaling and high availability.
Colyseus is a real-time multiplayer game framework for Node.js that provides an authoritative server model, delta-compressed state synchronization, and room-based session orchestration. It is designed to handle the core infrastructure of multiplayer games, including matchmaking, state management, and scalable process distribution across multiple servers. The framework distinguishes itself through its schema-based state definition, which enables automatic serialization and change tracking, combined with a binary WebSocket protocol for low-latency updates. Its matchmaking pipeline routes player
Distributes room instances across multiple Node.js processes or machines via a central coordinator.
KBEngine 是一个分布式游戏服务器引擎和后端基础设施,专为大型多人在线环境设计。它提供了一个多进程架构,以处理共享虚拟世界中的高玩家并发和实时交互。 该系统具有一个可脚本化的游戏逻辑框架,结合了高性能核心与高级脚本语言。这允许通过热修复运行时修改游戏行为,无需重启服务器即可更新逻辑。 该引擎通过跨多个硬件节点的动态负载均衡来管理服务器扩展,并通过服务器与游戏客户端之间的实时状态同步确保一致的世界视图。它还包含游戏数据持久化机制,例如定期的实体备份和服务器状态快照。 管理功能包括用于监控系统状态和管理服务器生命周期的实时服务器调试工具。
Utilizes a distributed multi-process architecture and dynamic load balancing to handle high player concurrency across hardware nodes.
Tdarr 是一款分布式视频处理和媒体库自动化工具。它采用服务器-节点架构,根据自定义规则管理音频和视频文件的扫描、分析和标准化。 该系统将繁重的计算工作负载(如转码和健康检查)分发到多个远程节点,以优化硬件利用率。它使用基于插件的流水线来执行过滤器和转换序列,通过 FFmpeg 和 HandBrake 自动化媒体转换,以标准化文件格式和容器。 该项目涵盖媒体库健康审计以验证文件完整性和播放错误,以及元数据驱动的索引以从视频文件中提取技术属性。它包括文件系统监控,以便在检测到新媒体时自动触发处理作业。
Offloads compute-intensive video processing and health auditing to a distributed network of remote nodes.
Bee-queue 是一个 Node.js 后台处理系统,使用 Redis 进行作业排队和持久化。它旨在将繁重任务从主执行线程卸载到后台 worker,以保持应用响应性。 该项目提供分布式作业处理,允许 worker 节点跨多个进程运行以并发处理大量任务。它通过自动重试和停滞进程的恢复确保可靠的任务执行。 其能力范围涵盖延迟作业的异步任务调度、worker 节点的并发控制以及作业生命周期管理。它包含用于监控队列健康、跟踪作业进度以及根据作业状态检索结果的工具。 该系统支持批量作业入队以减少网络开销,并允许为失败任务设置自定义作业标识符和可配置的退避策略。
Distributes computational tasks across multiple worker processes to handle high volumes concurrently.