25 Repos
Systems designed to distribute computational workloads across multiple networked machines.
Distinguishing note: Focuses on workload distribution and parallel processing across a cluster rather than general cluster management.
Explore 25 awesome GitHub repositories matching devops & infrastructure · Distributed Computing Frameworks. Refine with filters or upvote what's useful.
Exo is a distributed inference engine designed to run machine learning models across local hardware. It functions as a network orchestration layer that automatically discovers available devices to form a unified computing cluster, allowing users to scale artificial intelligence workloads by distributing computational tasks across multiple machines. The platform distinguishes itself through its ability to manage the entire lifecycle of local models while providing a standardized gateway for external applications. By translating local model outputs into industry-standard formats, it enables exi
Distributes large computational workloads across multiple local devices to improve processing performance.
Ray is a distributed computing framework designed to scale Python and Java applications across clusters by abstracting task scheduling and resource management. It functions as a resource-aware execution engine that manages task dependencies, placement, and fault tolerance across networked compute nodes. At its core, the system provides a stateful actor model, allowing developers to define classes that run in dedicated processes to maintain and mutate internal state across remote method calls. The framework distinguishes itself through a robust cross-language interoperability layer, enabling f
A programming model that scales Python and Java applications across clusters by abstracting task scheduling and resource management.
Puter is a browser-based desktop environment and cloud-native development platform that provides a virtualized graphical workspace. It enables developers to build and deploy full-stack web applications by integrating cloud storage, authentication, and serverless backend logic directly into the browser, eliminating the need for traditional server infrastructure. The platform distinguishes itself through a unified cloud storage layer and a distributed network runtime that facilitates peer-to-peer communication and cross-origin resource fetching. It features a sophisticated cross-window orchestr
Provides a browser-native execution environment for peer-to-peer communication and decentralized applications.
Anoma is a distributed operating system designed to abstract the complexities of blockchain networks into a unified interface for cross-chain coordination. At its core, the platform utilizes a resource-based state machine and an intent-centric execution model, where user-defined goals are processed and settled by decentralized solvers rather than through direct, manual execution. This architecture enables the creation of applications that operate across heterogeneous distributed networks while maintaining a consistent developer and user experience. The platform distinguishes itself through a
Abstracts blockchain complexities to provide a unified interface for users and developers.
This project is a comprehensive microservices development framework designed to build scalable, resilient backend systems. It provides a production-ready runtime that integrates stability patterns directly into the service architecture, ensuring consistent performance and reliability for both web and remote procedure call services even under heavy traffic conditions. The framework centers on an interface-first development model, utilizing a domain-specific language to define service contracts that serve as the single source of truth. This approach powers an extensive code generation ecosystem
Provides a production-ready runtime environment designed for high performance and reliability under heavy network traffic.
Linera is a multi-chain smart contract platform designed for horizontal scalability through a microchain-based distributed ledger. By partitioning state into independent, parallel chains that share a common validator set, the protocol enables high-performance execution of modular applications. The system utilizes a WebAssembly-based runtime to ensure secure, platform-independent execution of contract logic across the network. The platform distinguishes itself through an asynchronous messaging framework that coordinates state changes between chains by queuing messages for execution in subseque
Interact with applications using operations for local chain execution and messages for cross-chain communication to ensure atomicity through bundled message groups.
Hyperframes is an HTML-to-video rendering engine and composition tool that transforms web layouts and CSS into encoded video files. It functions as a headless browser video pipeline and a distributed video rendering framework, allowing users to create seekable animations and programmatic motion designs using HTML, CSS, and JavaScript. The project differentiates itself as an AI agent video orchestrator, enabling the automation of video scripts and compositions through natural language prompts. It supports distributed video encoding by splitting rendering tasks across multiple serverless functi
Implements a cloud-native infrastructure for splitting video encoding tasks across serverless functions and worker processes.
Dapr is a distributed application runtime that provides a sidecar-based infrastructure layer for building resilient microservices and event-driven applications. By utilizing a sidecar proxy pattern, it abstracts complex infrastructure tasks into standardized, network-accessible APIs, allowing developers to focus on application logic while the runtime handles service discovery, state management, and secure communication. The platform distinguishes itself through a pluggable component architecture and language-agnostic design, enabling services written in any programming language to interact wi
Write distributed applications using language-specific tools that provide simple interfaces for interacting with runtime building blocks and underlying infrastructure services during the development process.
This project serves as a comprehensive, community-driven directory of high-quality open-source Python libraries and tools for machine learning, data science, and artificial intelligence. It functions as a centralized resource for developers to discover, evaluate, and track the maintenance status of software packages across the entire machine learning ecosystem. The platform distinguishes itself through automated popularity tracking and data-driven content curation, which programmatically validate and rank projects based on community activity and development velocity. By organizing these tools
Parallelizes training and inference workloads across large-scale compute infrastructure.
This project is a functional programming library and toolkit for building production TypeScript applications. It provides a system for managing concurrency, error handling, and resource lifecycles using functional effects. The project distinguishes itself through a comprehensive suite of specialized toolkits, including a dependency injection framework for decoupling service implementations, a workflow orchestrator for coordinating durable processes, and a SQL database toolkit for consistent data operations across multiple dialects. It also implements an OpenTelemetry instrumentation library f
Spreads heavy workloads across multiple worker nodes to process data in parallel.
Bullet3 is a professional physics simulation engine designed for calculating rigid body, soft body, and collision dynamics within 3D environments and robotics applications. It functions as a computational framework for determining complex geometric intersections and contact manifolds between objects in simulated space. The library distinguishes itself through a distributed rendering framework that scales heavy graphical workloads and scene generation tasks across large clusters of machines. This capability enables the production of massive datasets by distributing complex scene generation acr
Scales heavy graphical workloads and scene generation tasks across large clusters of machines.
Dask ist ein Framework für paralleles Rechnen und ein verteilter Task-Scheduler, der darauf ausgelegt ist, Python-Data-Science-Workflows von einzelnen Maschinen auf große Cluster zu skalieren. Es fungiert als Cluster-Ressourcenmanager, der die Berechnungslogik orchestriert, indem Aufgaben und deren Abhängigkeiten als gerichtete azyklische Graphen dargestellt werden. Diese Architektur ermöglicht es dem System, die Verteilung von Workloads auf verfügbare Hardware zu automatisieren und gleichzeitig komplexe Ausführungsanforderungen zu verwalten. Das Projekt zeichnet sich durch eine Lazy-Evaluation-Engine aus, die Datenoperationen verzögert, bis sie explizit angefordert werden, was eine globale Graphoptimierung und effiziente Ressourcenzuweisung ermöglicht. Es integriert speicherbewusstes Data-Spilling, um Systemabstürze bei der Verarbeitung von Datensätzen zu verhindern, die den verfügbaren Speicher überschreiten, und nutzt Task-Graph-Fusion, um Sequenzen von Operationen in einzelne Ausführungsschritte zu kombinieren, wodurch Scheduling-Overhead und Inter-Node-Kommunikation minimiert werden. Die Plattform bietet eine umfassende Oberfläche für die Datenanalyse im großen Maßstab, einschließlich Unterstützung für verteiltes maschinelles Lernen, Integration in das Hochleistungsrechnen und parallele Datenverarbeitung. Sie bietet umfangreiche Werkzeuge für das Cluster-Lebenszyklusmanagement, Performance-Profiling und die Echtzeitüberwachung der Aufgabenausführung. Benutzer können diese Umgebungen über verschiedene Infrastrukturen hinweg bereitstellen, einschließlich lokaler Hardware, Cloud-Anbietern, containerisierten Systemen und Hochleistungsrechner-Clustern.
Provides a framework for scaling Python workflows from single machines to distributed clusters by orchestrating task graphs.
Meshroom is a node-based photogrammetry software designed to transform collections of two-dimensional images into three-dimensional models and scene geometry. It provides a visual interface for constructing and managing modular data pipelines, allowing users to automate complex computer vision tasks such as feature extraction, depth map estimation, and mesh generation. The software distinguishes itself through a distributed computational framework that dispatches resource-intensive tasks across local hardware or remote render farms. By utilizing a directed acyclic graph execution model, it en
Dispatches resource-intensive reconstruction tasks across local hardware or remote render farms to optimize processing performance.
QuantAxis is a quantitative trading platform and algorithmic trading framework. It provides a comprehensive local environment for backtesting strategies, managing financial market data, and executing trades across stocks, futures, and options markets. The system distinguishes itself through a distributed task scheduler that spreads asynchronous computations and heavy mathematical workloads across a network of remote agents. It incorporates a multi-account trading interface to standardize the monitoring of positions and the execution of orders across various brokerage accounts. The platform c
Distributes asynchronous computational workloads across a local network of remote agents.
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Distributes computational workloads across cloud CPUs and GPUs using ephemeral clusters and spot instances.
Hyperopt is a Python library for hyperparameter optimization designed to minimize scalar-valued objective functions. It operates as a stochastic search space engine that finds optimal input parameters by searching through real-valued, discrete, and conditional spaces. The framework distinguishes itself through its support for complex search space configurations, allowing for conditional parameter hierarchies where specific hyperparameters are sampled only if their parent parameters meet certain criteria. It is built as an asynchronous optimization framework, decoupling the generation of searc
Parallelizes the hyperparameter search process across multiple machines using external clusters or database backends.
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
Executes parallel or distributed computing tasks by initializing frameworks like Spark, Ray, or Dask directly within pipeline steps.
Apache Mesos ist ein Kernel für verteilte Systeme und ein Cluster-Ressourcenmanager, der CPU, Arbeitsspeicher und Speicher über einen Pool von Knoten hinweg abstrahiert. Es fungiert als Orchestrator für verteilte Infrastruktur und bietet eine Schicht, um mehrere Orchestrierungs-Frameworks auf einer gemeinsamen Menge physischer oder virtueller Maschinen auszuführen. Das System agiert als Ressourcen-Isolations-Engine, die einen gemeinsamen Cluster in isolierte Container unterteilt, um diverse Workloads gleichzeitig auszuführen. Es ermöglicht Multi-Framework-Orchestrierung, wodurch verschiedene Frameworks für verteilte Anwendungen eine einzige Infrastruktur teilen können, um die Hardwareauslastung zu maximieren. Das Projekt deckt die Verteilung von Rechenleistung im großen Maßstab und das Management verteilter Cluster ab. Die Funktionen umfassen die Verwaltung verteilter Ressourcen und die Isolierung von Rechenleistung über mehrere Anwendungen hinweg, um Interferenzen zu verhindern und eine stabile Leistung auf gemeinsam genutzten Servern sicherzustellen.
Provides a distributed infrastructure for running multiple computing frameworks across networked machines.
Volcano is a Kubernetes-native batch scheduler specialized for AI, machine learning, and high-performance computing workloads. It provides gang scheduling to atomically allocate resources for all tasks of a distributed job, preventing deadlocks from partial allocation, and supports hierarchical queue management for multi-tenant resource isolation with configurable quotas, borrowing, and preemption. Topology-aware placement optimizes communication-intensive workloads by modeling network hierarchy to minimize cross-switch latency. Volcano differentiates itself with automated orchestration of di
Runs batch jobs from popular data processing, ML, and streaming frameworks without custom integration.
statsforecast ist eine statistische Hochleistungs-Bibliothek für Zeitreihenprognosen, die darauf ausgelegt ist, Punktprognosen und Vorhersageintervalle zu generieren. Sie fungiert als verteiltes Zeitreihen-Framework, das eine C-basierte Prognose-Engine und einen automatisierten Modellselektor nutzt, um das optimale statistische Modell für jede einzigartige Serie in einem Datensatz zu identifizieren und anzupassen. Das System enthält zudem einen Zeitreihen-Anomaliedetektor, um ungewöhnliche Datenpunkte durch den Vergleich beobachteter Werte mit probabilistischen Prognoseintervallen zu identifizieren. Das Projekt zeichnet sich durch seine Fähigkeit aus, massiv parallele Prognosen für Millionen individueller Serien zu verarbeiten. Dies erreicht es durch ein verteiltes Computing-Framework, Multi-Core-Parallel-Ausführung und kompilierte C-Kernels, die die Kernlogik von ARIMA und exponentieller Glättung beschleunigen. Das System optimiert die großskalige Verarbeitung weiter unter Verwendung eines Long-Format-Datenlayouts und einer Lazy-Evaluation-Datenpipeline, um den Speicher-Overhead zu reduzieren. Die Bibliothek bietet eine umfassende Suite von Modellen, einschließlich AutoARIMA, verschiedenen Methoden der exponentiellen Glättung für intermittierende oder saisonale Nachfrage, Theta-Dekomposition und GARCH-Volatilitätsmodellierung für finanzielles Risiko. Sie deckt breitere Funktionsbereiche ab, wie multivariate Prognosen mit exogenen Variablen, Zeitreihen-Dekomposition und Modellevaluierung mittels historischer Kreuzvalidierung und Sliding-Window-Analyse. Die Bibliothek integriert sich mit Hochleistungs-Datenstrukturen wie Polars und bietet Dienstprogramme, um gespeicherte Modelle als REST-Endpunkte für netzwerkzugängliche Vorhersagen bereitzustellen.
Scales forecasting workloads across server clusters using distributed computing and parallel execution.