26 repository-uri
Systems designed to distribute computational workloads across multiple networked machines.
Distinguishing note: Focuses on workload distribution and parallel processing across a cluster rather than general cluster management.
Explore 26 awesome GitHub repositories matching devops & infrastructure · Distributed Computing Frameworks. Refine with filters or upvote what's useful.
Exo is a distributed inference engine designed to run machine learning models across local hardware. It functions as a network orchestration layer that automatically discovers available devices to form a unified computing cluster, allowing users to scale artificial intelligence workloads by distributing computational tasks across multiple machines. The platform distinguishes itself through its ability to manage the entire lifecycle of local models while providing a standardized gateway for external applications. By translating local model outputs into industry-standard formats, it enables exi
Distributes large computational workloads across multiple local devices to improve processing performance.
Ray is a distributed computing framework designed to scale Python and Java applications across clusters by abstracting task scheduling and resource management. It functions as a resource-aware execution engine that manages task dependencies, placement, and fault tolerance across networked compute nodes. At its core, the system provides a stateful actor model, allowing developers to define classes that run in dedicated processes to maintain and mutate internal state across remote method calls. The framework distinguishes itself through a robust cross-language interoperability layer, enabling f
A programming model that scales Python and Java applications across clusters by abstracting task scheduling and resource management.
Puter is a browser-based desktop environment and cloud-native development platform that provides a virtualized graphical workspace. It enables developers to build and deploy full-stack web applications by integrating cloud storage, authentication, and serverless backend logic directly into the browser, eliminating the need for traditional server infrastructure. The platform distinguishes itself through a unified cloud storage layer and a distributed network runtime that facilitates peer-to-peer communication and cross-origin resource fetching. It features a sophisticated cross-window orchestr
Provides a browser-native execution environment for peer-to-peer communication and decentralized applications.
Anoma is a distributed operating system designed to abstract the complexities of blockchain networks into a unified interface for cross-chain coordination. At its core, the platform utilizes a resource-based state machine and an intent-centric execution model, where user-defined goals are processed and settled by decentralized solvers rather than through direct, manual execution. This architecture enables the creation of applications that operate across heterogeneous distributed networks while maintaining a consistent developer and user experience. The platform distinguishes itself through a
Abstracts blockchain complexities to provide a unified interface for users and developers.
This project is a comprehensive microservices development framework designed to build scalable, resilient backend systems. It provides a production-ready runtime that integrates stability patterns directly into the service architecture, ensuring consistent performance and reliability for both web and remote procedure call services even under heavy traffic conditions. The framework centers on an interface-first development model, utilizing a domain-specific language to define service contracts that serve as the single source of truth. This approach powers an extensive code generation ecosystem
Provides a production-ready runtime environment designed for high performance and reliability under heavy network traffic.
Linera is a multi-chain smart contract platform designed for horizontal scalability through a microchain-based distributed ledger. By partitioning state into independent, parallel chains that share a common validator set, the protocol enables high-performance execution of modular applications. The system utilizes a WebAssembly-based runtime to ensure secure, platform-independent execution of contract logic across the network. The platform distinguishes itself through an asynchronous messaging framework that coordinates state changes between chains by queuing messages for execution in subseque
Interact with applications using operations for local chain execution and messages for cross-chain communication to ensure atomicity through bundled message groups.
Hyperframes is an HTML-to-video rendering engine and composition tool that transforms web layouts and CSS into encoded video files. It functions as a headless browser video pipeline and a distributed video rendering framework, allowing users to create seekable animations and programmatic motion designs using HTML, CSS, and JavaScript. The project differentiates itself as an AI agent video orchestrator, enabling the automation of video scripts and compositions through natural language prompts. It supports distributed video encoding by splitting rendering tasks across multiple serverless functi
Implements a cloud-native infrastructure for splitting video encoding tasks across serverless functions and worker processes.
Dapr is a distributed application runtime that provides a sidecar-based infrastructure layer for building resilient microservices and event-driven applications. By utilizing a sidecar proxy pattern, it abstracts complex infrastructure tasks into standardized, network-accessible APIs, allowing developers to focus on application logic while the runtime handles service discovery, state management, and secure communication. The platform distinguishes itself through a pluggable component architecture and language-agnostic design, enabling services written in any programming language to interact wi
Write distributed applications using language-specific tools that provide simple interfaces for interacting with runtime building blocks and underlying infrastructure services during the development process.
This project serves as a comprehensive, community-driven directory of high-quality open-source Python libraries and tools for machine learning, data science, and artificial intelligence. It functions as a centralized resource for developers to discover, evaluate, and track the maintenance status of software packages across the entire machine learning ecosystem. The platform distinguishes itself through automated popularity tracking and data-driven content curation, which programmatically validate and rank projects based on community activity and development velocity. By organizing these tools
Parallelizes training and inference workloads across large-scale compute infrastructure.
This project is a functional programming library and toolkit for building production TypeScript applications. It provides a system for managing concurrency, error handling, and resource lifecycles using functional effects. The project distinguishes itself through a comprehensive suite of specialized toolkits, including a dependency injection framework for decoupling service implementations, a workflow orchestrator for coordinating durable processes, and a SQL database toolkit for consistent data operations across multiple dialects. It also implements an OpenTelemetry instrumentation library f
Spreads heavy workloads across multiple worker nodes to process data in parallel.
Bullet3 is a professional physics simulation engine designed for calculating rigid body, soft body, and collision dynamics within 3D environments and robotics applications. It functions as a computational framework for determining complex geometric intersections and contact manifolds between objects in simulated space. The library distinguishes itself through a distributed rendering framework that scales heavy graphical workloads and scene generation tasks across large clusters of machines. This capability enables the production of massive datasets by distributing complex scene generation acr
Scales heavy graphical workloads and scene generation tasks across large clusters of machines.
Dask este un framework de calcul paralel și un scheduler de sarcini distribuit conceput pentru a scala fluxurile de lucru de știința datelor în Python de la mașini individuale la clustere mari. Acesta funcționează ca un manager de resurse de cluster care orchestrează logica computațională prin reprezentarea sarcinilor și a dependențelor acestora sub formă de grafuri aciclice direcționate. Această arhitectură permite sistemului să automatizeze distribuția sarcinilor de lucru pe hardware-ul disponibil, gestionând în același timp cerințe complexe de execuție. Proiectul se distinge printr-un motor de evaluare leneșă (lazy) care amână operațiunile pe date până când sunt solicitate explicit, permițând optimizarea globală a grafului și alocarea eficientă a resurselor. Acesta încorporează „spilling” de date conștient de memorie pentru a preveni blocarea sistemului la procesarea seturilor de date care depășesc memoria disponibilă și utilizează fuziunea grafului de sarcini pentru a combina secvențe de operațiuni în pași de execuție unici, minimizând overhead-ul de programare și comunicarea între noduri. Platforma oferă o suprafață cuprinzătoare de capabilități pentru analiza datelor la scară largă, inclusiv suport pentru învățare automată distribuită, integrare cu calcul de înaltă performanță și procesare paralelă a datelor. Oferă instrumente extinse pentru gestionarea ciclului de viață al clusterului, profilarea performanței și monitorizarea în timp real a execuției sarcinilor. Utilizatorii pot implementa aceste medii pe diverse infrastructuri, inclusiv hardware local, furnizori de cloud, sisteme containerizate și clustere de calcul de înaltă performanță.
Provides a framework for scaling Python workflows from single machines to distributed clusters by orchestrating task graphs.
Meshroom is a node-based photogrammetry software designed to transform collections of two-dimensional images into three-dimensional models and scene geometry. It provides a visual interface for constructing and managing modular data pipelines, allowing users to automate complex computer vision tasks such as feature extraction, depth map estimation, and mesh generation. The software distinguishes itself through a distributed computational framework that dispatches resource-intensive tasks across local hardware or remote render farms. By utilizing a directed acyclic graph execution model, it en
Dispatches resource-intensive reconstruction tasks across local hardware or remote render farms to optimize processing performance.
QuantAxis is a quantitative trading platform and algorithmic trading framework. It provides a comprehensive local environment for backtesting strategies, managing financial market data, and executing trades across stocks, futures, and options markets. The system distinguishes itself through a distributed task scheduler that spreads asynchronous computations and heavy mathematical workloads across a network of remote agents. It incorporates a multi-account trading interface to standardize the monitoring of positions and the execution of orders across various brokerage accounts. The platform c
Distributes asynchronous computational workloads across a local network of remote agents.
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Distributes computational workloads across cloud CPUs and GPUs using ephemeral clusters and spot instances.
Hyperopt is a Python library for hyperparameter optimization designed to minimize scalar-valued objective functions. It operates as a stochastic search space engine that finds optimal input parameters by searching through real-valued, discrete, and conditional spaces. The framework distinguishes itself through its support for complex search space configurations, allowing for conditional parameter hierarchies where specific hyperparameters are sampled only if their parent parameters meet certain criteria. It is built as an asynchronous optimization framework, decoupling the generation of searc
Parallelizes the hyperparameter search process across multiple machines using external clusters or database backends.
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
Executes parallel or distributed computing tasks by initializing frameworks like Spark, Ray, or Dask directly within pipeline steps.
Apache Mesos este un kernel de sisteme distribuite și un manager de resurse de cluster care abstractizează CPU-ul, memoria și stocarea pe un pool de noduri. Acesta funcționează ca un orchestrator de infrastructură distribuită, oferind un strat pentru a rula mai multe framework-uri de orchestrare pe un set partajat de mașini fizice sau virtuale. Sistemul acționează ca un motor de izolare a resurselor, împărțind un cluster partajat în containere izolate pentru a rula diverse sarcini de lucru simultan. Acesta permite orchestrarea multi-framework, permițând diferitelor framework-uri de aplicații distribuite să partajeze o singură infrastructură pentru a maximiza utilizarea hardware-ului. Proiectul acoperă distribuția de calcul la scară largă și gestionarea clusterelor distribuite. Capabilitățile sale includ gestionarea resurselor distribuite și izolarea puterii de calcul pe mai multe aplicații pentru a preveni interferențele și a asigura o performanță stabilă pe serverele partajate.
Provides a distributed infrastructure for running multiple computing frameworks across networked machines.
Volcano is a Kubernetes-native batch scheduler specialized for AI, machine learning, and high-performance computing workloads. It provides gang scheduling to atomically allocate resources for all tasks of a distributed job, preventing deadlocks from partial allocation, and supports hierarchical queue management for multi-tenant resource isolation with configurable quotas, borrowing, and preemption. Topology-aware placement optimizes communication-intensive workloads by modeling network hierarchy to minimize cross-switch latency. Volcano differentiates itself with automated orchestration of di
Runs batch jobs from popular data processing, ML, and streaming frameworks without custom integration.
statsforecast este o bibliotecă de prognoză statistică a seriilor temporale de înaltă performanță, concepută pentru a genera prognoze punctuale și intervale de predicție. Funcționează ca un framework distribuit de serii temporale care utilizează un motor de prognoză bazat pe C și un selector automat de modele pentru a identifica și potrivi modelul statistic optim pentru fiecare serie unică dintr-un set de date. Sistemul include, de asemenea, un detector de anomalii pentru serii temporale pentru a identifica punctele de date neobișnuite prin compararea valorilor observate cu intervalele de prognoză probabilistice. Proiectul se distinge prin capacitatea sa de a gestiona prognoza paralelă la scară masivă pentru milioane de serii individuale. Realizează acest lucru printr-un framework de calcul distribuit, execuție paralelă multi-core și kernel-uri C compilate care accelerează logica de bază ARIMA și de netezire exponențială. Sistemul optimizează în continuare procesarea la scară largă folosind un layout de date în format lung și un pipeline de date cu evaluare leneșă (lazy-evaluation) pentru a reduce overhead-ul de memorie. Biblioteca oferă o suită cuprinzătoare de modele, inclusiv AutoARIMA, diverse metode de netezire exponențială pentru cererea intermitentă sau sezonieră, descompunerea Theta și modelarea volatilității GARCH pentru riscul financiar. Acoperă domenii mai largi de capabilități, cum ar fi prognoza multivariată cu variabile exogene, descompunerea seriilor temporale și evaluarea modelelor prin cross-validare istorică și analiză sliding window. Biblioteca se integrează cu structuri de date de înaltă performanță precum Polars și oferă utilitare pentru a servi modelele salvate ca endpoint-uri REST pentru predicții accesibile prin rețea.
Scales forecasting workloads across server clusters using distributed computing and parallel execution.