26 रिपॉजिटरी
Systems designed to distribute computational workloads across multiple networked machines.
Distinguishing note: Focuses on workload distribution and parallel processing across a cluster rather than general cluster management.
Explore 26 awesome GitHub repositories matching devops & infrastructure · Distributed Computing Frameworks. Refine with filters or upvote what's useful.
Exo is a distributed inference engine designed to run machine learning models across local hardware. It functions as a network orchestration layer that automatically discovers available devices to form a unified computing cluster, allowing users to scale artificial intelligence workloads by distributing computational tasks across multiple machines. The platform distinguishes itself through its ability to manage the entire lifecycle of local models while providing a standardized gateway for external applications. By translating local model outputs into industry-standard formats, it enables exi
Distributes large computational workloads across multiple local devices to improve processing performance.
Ray is a distributed computing framework designed to scale Python and Java applications across clusters by abstracting task scheduling and resource management. It functions as a resource-aware execution engine that manages task dependencies, placement, and fault tolerance across networked compute nodes. At its core, the system provides a stateful actor model, allowing developers to define classes that run in dedicated processes to maintain and mutate internal state across remote method calls. The framework distinguishes itself through a robust cross-language interoperability layer, enabling f
A programming model that scales Python and Java applications across clusters by abstracting task scheduling and resource management.
Puter is a browser-based desktop environment and cloud-native development platform that provides a virtualized graphical workspace. It enables developers to build and deploy full-stack web applications by integrating cloud storage, authentication, and serverless backend logic directly into the browser, eliminating the need for traditional server infrastructure. The platform distinguishes itself through a unified cloud storage layer and a distributed network runtime that facilitates peer-to-peer communication and cross-origin resource fetching. It features a sophisticated cross-window orchestr
Provides a browser-native execution environment for peer-to-peer communication and decentralized applications.
Anoma is a distributed operating system designed to abstract the complexities of blockchain networks into a unified interface for cross-chain coordination. At its core, the platform utilizes a resource-based state machine and an intent-centric execution model, where user-defined goals are processed and settled by decentralized solvers rather than through direct, manual execution. This architecture enables the creation of applications that operate across heterogeneous distributed networks while maintaining a consistent developer and user experience. The platform distinguishes itself through a
Abstracts blockchain complexities to provide a unified interface for users and developers.
This project is a comprehensive microservices development framework designed to build scalable, resilient backend systems. It provides a production-ready runtime that integrates stability patterns directly into the service architecture, ensuring consistent performance and reliability for both web and remote procedure call services even under heavy traffic conditions. The framework centers on an interface-first development model, utilizing a domain-specific language to define service contracts that serve as the single source of truth. This approach powers an extensive code generation ecosystem
Provides a production-ready runtime environment designed for high performance and reliability under heavy network traffic.
Linera is a multi-chain smart contract platform designed for horizontal scalability through a microchain-based distributed ledger. By partitioning state into independent, parallel chains that share a common validator set, the protocol enables high-performance execution of modular applications. The system utilizes a WebAssembly-based runtime to ensure secure, platform-independent execution of contract logic across the network. The platform distinguishes itself through an asynchronous messaging framework that coordinates state changes between chains by queuing messages for execution in subseque
Interact with applications using operations for local chain execution and messages for cross-chain communication to ensure atomicity through bundled message groups.
Hyperframes is an HTML-to-video rendering engine and composition tool that transforms web layouts and CSS into encoded video files. It functions as a headless browser video pipeline and a distributed video rendering framework, allowing users to create seekable animations and programmatic motion designs using HTML, CSS, and JavaScript. The project differentiates itself as an AI agent video orchestrator, enabling the automation of video scripts and compositions through natural language prompts. It supports distributed video encoding by splitting rendering tasks across multiple serverless functi
Implements a cloud-native infrastructure for splitting video encoding tasks across serverless functions and worker processes.
Dapr is a distributed application runtime that provides a sidecar-based infrastructure layer for building resilient microservices and event-driven applications. By utilizing a sidecar proxy pattern, it abstracts complex infrastructure tasks into standardized, network-accessible APIs, allowing developers to focus on application logic while the runtime handles service discovery, state management, and secure communication. The platform distinguishes itself through a pluggable component architecture and language-agnostic design, enabling services written in any programming language to interact wi
Write distributed applications using language-specific tools that provide simple interfaces for interacting with runtime building blocks and underlying infrastructure services during the development process.
This project serves as a comprehensive, community-driven directory of high-quality open-source Python libraries and tools for machine learning, data science, and artificial intelligence. It functions as a centralized resource for developers to discover, evaluate, and track the maintenance status of software packages across the entire machine learning ecosystem. The platform distinguishes itself through automated popularity tracking and data-driven content curation, which programmatically validate and rank projects based on community activity and development velocity. By organizing these tools
Parallelizes training and inference workloads across large-scale compute infrastructure.
This project is a functional programming library and toolkit for building production TypeScript applications. It provides a system for managing concurrency, error handling, and resource lifecycles using functional effects. The project distinguishes itself through a comprehensive suite of specialized toolkits, including a dependency injection framework for decoupling service implementations, a workflow orchestrator for coordinating durable processes, and a SQL database toolkit for consistent data operations across multiple dialects. It also implements an OpenTelemetry instrumentation library f
Spreads heavy workloads across multiple worker nodes to process data in parallel.
Bullet3 is a professional physics simulation engine designed for calculating rigid body, soft body, and collision dynamics within 3D environments and robotics applications. It functions as a computational framework for determining complex geometric intersections and contact manifolds between objects in simulated space. The library distinguishes itself through a distributed rendering framework that scales heavy graphical workloads and scene generation tasks across large clusters of machines. This capability enables the production of massive datasets by distributing complex scene generation acr
Scales heavy graphical workloads and scene generation tasks across large clusters of machines.
Dask एक पैरेलल कंप्यूटिंग फ्रेमवर्क और डिस्ट्रीब्यूटेड टास्क शेड्यूलर है जिसे Python डेटा साइंस वर्कफ़्लो को सिंगल मशीनों से बड़े क्लस्टर्स तक स्केल करने के लिए डिज़ाइन किया गया है। यह एक क्लस्टर रिसोर्स मैनेजर के रूप में कार्य करता है जो कार्यों और उनकी डिपेंडेंसी को डायरेक्टेड एसाइक्लिक ग्राफ (DAGs) के रूप में प्रस्तुत करके कम्प्यूटेशनल लॉजिक को व्यवस्थित करता है। यह आर्किटेक्चर सिस्टम को जटिल निष्पादन आवश्यकताओं का प्रबंधन करते हुए उपलब्ध हार्डवेयर पर वर्कलोड के वितरण को स्वचालित करने की अनुमति देता है। यह प्रोजेक्ट एक लेज़ी इवैल्यूएशन इंजन के माध्यम से खुद को अलग करता है जो डेटा ऑपरेशन्स को तब तक स्थगित कर देता है जब तक कि उन्हें स्पष्ट रूप से अनुरोध न किया जाए, जिससे ग्लोबल ग्राफ ऑप्टिमाइज़ेशन और कुशल संसाधन आवंटन सक्षम होता है। इसमें उपलब्ध मेमोरी से अधिक डेटासेट को प्रोसेस करते समय सिस्टम क्रैश को रोकने के लिए मेमोरी-अवेयर डेटा स्पिलिंग शामिल है, और यह टास्क ग्राफ फ्यूजन का उपयोग ऑपरेशन्स के अनुक्रमों को एकल निष्पादन चरणों में संयोजित करने के लिए करता है, जिससे शेड्यूलिंग ओवरहेड और इंटर-नोड संचार कम हो जाता है। यह प्लेटफॉर्म बड़े पैमाने पर डेटा एनालिटिक्स के लिए एक व्यापक क्षमता सतह प्रदान करता है, जिसमें डिस्ट्रीब्यूटेड मशीन लर्निंग, उच्च-प्रदर्शन कंप्यूटिंग एकीकरण, और पैरेलल डेटा प्रोसेसिंग के लिए समर्थन शामिल है। यह क्लस्टर लाइफसाइकिल मैनेजमेंट, परफॉरमेंस प्रोफाइलिंग, और टास्क निष्पादन की रीयल-टाइम मॉनिटरिंग के लिए व्यापक उपकरण प्रदान करता है। उपयोगकर्ता इन वातावरणों को स्थानीय हार्डवेयर, क्लाउड प्रदाताओं, कंटेनरीकृत सिस्टम, और उच्च-प्रदर्शन कंप्यूटिंग क्लस्टर्स सहित विविध बुनियादी ढांचे पर तैनात कर सकते हैं।
Provides a framework for scaling Python workflows from single machines to distributed clusters by orchestrating task graphs.
Meshroom is a node-based photogrammetry software designed to transform collections of two-dimensional images into three-dimensional models and scene geometry. It provides a visual interface for constructing and managing modular data pipelines, allowing users to automate complex computer vision tasks such as feature extraction, depth map estimation, and mesh generation. The software distinguishes itself through a distributed computational framework that dispatches resource-intensive tasks across local hardware or remote render farms. By utilizing a directed acyclic graph execution model, it en
Dispatches resource-intensive reconstruction tasks across local hardware or remote render farms to optimize processing performance.
QuantAxis is a quantitative trading platform and algorithmic trading framework. It provides a comprehensive local environment for backtesting strategies, managing financial market data, and executing trades across stocks, futures, and options markets. The system distinguishes itself through a distributed task scheduler that spreads asynchronous computations and heavy mathematical workloads across a network of remote agents. It incorporates a multi-account trading interface to standardize the monitoring of positions and the execution of orders across various brokerage accounts. The platform c
Distributes asynchronous computational workloads across a local network of remote agents.
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Distributes computational workloads across cloud CPUs and GPUs using ephemeral clusters and spot instances.
Hyperopt is a Python library for hyperparameter optimization designed to minimize scalar-valued objective functions. It operates as a stochastic search space engine that finds optimal input parameters by searching through real-valued, discrete, and conditional spaces. The framework distinguishes itself through its support for complex search space configurations, allowing for conditional parameter hierarchies where specific hyperparameters are sampled only if their parent parameters meet certain criteria. It is built as an asynchronous optimization framework, decoupling the generation of searc
Parallelizes the hyperparameter search process across multiple machines using external clusters or database backends.
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
Executes parallel or distributed computing tasks by initializing frameworks like Spark, Ray, or Dask directly within pipeline steps.
Apache Mesos is a distributed systems kernel and cluster resource manager that abstracts CPU, memory, and storage across a pool of nodes. It functions as a distributed infrastructure orchestrator, providing a layer to run multiple orchestration frameworks on a shared set of physical or virtual machines. The system acts as a resource isolation engine, dividing a shared cluster into isolated containers to run diverse workloads concurrently. It enables multi-framework orchestration, allowing different distributed application frameworks to share a single infrastructure to maximize hardware utiliz
Provides a distributed infrastructure for running multiple computing frameworks across networked machines.
Volcano is a Kubernetes-native batch scheduler specialized for AI, machine learning, and high-performance computing workloads. It provides gang scheduling to atomically allocate resources for all tasks of a distributed job, preventing deadlocks from partial allocation, and supports hierarchical queue management for multi-tenant resource isolation with configurable quotas, borrowing, and preemption. Topology-aware placement optimizes communication-intensive workloads by modeling network hierarchy to minimize cross-switch latency. Volcano differentiates itself with automated orchestration of di
Runs batch jobs from popular data processing, ML, and streaming frameworks without custom integration.
statsforecast is a high-performance statistical time series forecasting library designed to generate point forecasts and prediction intervals. It functions as a distributed time series framework that utilizes a C-based forecasting engine and an automated model selector to identify and fit the optimal statistical model for every unique series in a dataset. The system also includes a time series anomaly detector to identify unusual data points by comparing observed values against probabilistic forecast intervals. The project is distinguished by its ability to handle massive-scale parallel forec
Scales forecasting workloads across server clusters using distributed computing and parallel execution.