26 repositorios
Systems designed to distribute computational workloads across multiple networked machines.
Distinguishing note: Focuses on workload distribution and parallel processing across a cluster rather than general cluster management.
Explore 26 awesome GitHub repositories matching devops & infrastructure · Distributed Computing Frameworks. Refine with filters or upvote what's useful.
Exo is a distributed inference engine designed to run machine learning models across local hardware. It functions as a network orchestration layer that automatically discovers available devices to form a unified computing cluster, allowing users to scale artificial intelligence workloads by distributing computational tasks across multiple machines. The platform distinguishes itself through its ability to manage the entire lifecycle of local models while providing a standardized gateway for external applications. By translating local model outputs into industry-standard formats, it enables exi
Distributes large computational workloads across multiple local devices to improve processing performance.
Ray is a distributed computing framework designed to scale Python and Java applications across clusters by abstracting task scheduling and resource management. It functions as a resource-aware execution engine that manages task dependencies, placement, and fault tolerance across networked compute nodes. At its core, the system provides a stateful actor model, allowing developers to define classes that run in dedicated processes to maintain and mutate internal state across remote method calls. The framework distinguishes itself through a robust cross-language interoperability layer, enabling f
A programming model that scales Python and Java applications across clusters by abstracting task scheduling and resource management.
Puter is a browser-based desktop environment and cloud-native development platform that provides a virtualized graphical workspace. It enables developers to build and deploy full-stack web applications by integrating cloud storage, authentication, and serverless backend logic directly into the browser, eliminating the need for traditional server infrastructure. The platform distinguishes itself through a unified cloud storage layer and a distributed network runtime that facilitates peer-to-peer communication and cross-origin resource fetching. It features a sophisticated cross-window orchestr
Provides a browser-native execution environment for peer-to-peer communication and decentralized applications.
Anoma is a distributed operating system designed to abstract the complexities of blockchain networks into a unified interface for cross-chain coordination. At its core, the platform utilizes a resource-based state machine and an intent-centric execution model, where user-defined goals are processed and settled by decentralized solvers rather than through direct, manual execution. This architecture enables the creation of applications that operate across heterogeneous distributed networks while maintaining a consistent developer and user experience. The platform distinguishes itself through a
Abstracts blockchain complexities to provide a unified interface for users and developers.
This project is a comprehensive microservices development framework designed to build scalable, resilient backend systems. It provides a production-ready runtime that integrates stability patterns directly into the service architecture, ensuring consistent performance and reliability for both web and remote procedure call services even under heavy traffic conditions. The framework centers on an interface-first development model, utilizing a domain-specific language to define service contracts that serve as the single source of truth. This approach powers an extensive code generation ecosystem
Provides a production-ready runtime environment designed for high performance and reliability under heavy network traffic.
Linera is a multi-chain smart contract platform designed for horizontal scalability through a microchain-based distributed ledger. By partitioning state into independent, parallel chains that share a common validator set, the protocol enables high-performance execution of modular applications. The system utilizes a WebAssembly-based runtime to ensure secure, platform-independent execution of contract logic across the network. The platform distinguishes itself through an asynchronous messaging framework that coordinates state changes between chains by queuing messages for execution in subseque
Interact with applications using operations for local chain execution and messages for cross-chain communication to ensure atomicity through bundled message groups.
Hyperframes is an HTML-to-video rendering engine and composition tool that transforms web layouts and CSS into encoded video files. It functions as a headless browser video pipeline and a distributed video rendering framework, allowing users to create seekable animations and programmatic motion designs using HTML, CSS, and JavaScript. The project differentiates itself as an AI agent video orchestrator, enabling the automation of video scripts and compositions through natural language prompts. It supports distributed video encoding by splitting rendering tasks across multiple serverless functi
Implements a cloud-native infrastructure for splitting video encoding tasks across serverless functions and worker processes.
Dapr is a distributed application runtime that provides a sidecar-based infrastructure layer for building resilient microservices and event-driven applications. By utilizing a sidecar proxy pattern, it abstracts complex infrastructure tasks into standardized, network-accessible APIs, allowing developers to focus on application logic while the runtime handles service discovery, state management, and secure communication. The platform distinguishes itself through a pluggable component architecture and language-agnostic design, enabling services written in any programming language to interact wi
Write distributed applications using language-specific tools that provide simple interfaces for interacting with runtime building blocks and underlying infrastructure services during the development process.
This project serves as a comprehensive, community-driven directory of high-quality open-source Python libraries and tools for machine learning, data science, and artificial intelligence. It functions as a centralized resource for developers to discover, evaluate, and track the maintenance status of software packages across the entire machine learning ecosystem. The platform distinguishes itself through automated popularity tracking and data-driven content curation, which programmatically validate and rank projects based on community activity and development velocity. By organizing these tools
Parallelizes training and inference workloads across large-scale compute infrastructure.
This project is a functional programming library and toolkit for building production TypeScript applications. It provides a system for managing concurrency, error handling, and resource lifecycles using functional effects. The project distinguishes itself through a comprehensive suite of specialized toolkits, including a dependency injection framework for decoupling service implementations, a workflow orchestrator for coordinating durable processes, and a SQL database toolkit for consistent data operations across multiple dialects. It also implements an OpenTelemetry instrumentation library f
Spreads heavy workloads across multiple worker nodes to process data in parallel.
Bullet3 is a professional physics simulation engine designed for calculating rigid body, soft body, and collision dynamics within 3D environments and robotics applications. It functions as a computational framework for determining complex geometric intersections and contact manifolds between objects in simulated space. The library distinguishes itself through a distributed rendering framework that scales heavy graphical workloads and scene generation tasks across large clusters of machines. This capability enables the production of massive datasets by distributing complex scene generation acr
Scales heavy graphical workloads and scene generation tasks across large clusters of machines.
Dask es un framework de computación paralela y un programador de tareas distribuido diseñado para escalar flujos de trabajo de ciencia de datos en Python desde máquinas individuales hasta grandes clústeres. Funciona como un gestor de recursos de clúster que orquesta la lógica computacional representando las tareas y sus dependencias como grafos acíclicos dirigidos. Esta arquitectura permite al sistema automatizar la distribución de cargas de trabajo a través del hardware disponible mientras gestiona requisitos de ejecución complejos. El proyecto se distingue por un motor de evaluación perezosa que difiere las operaciones de datos hasta que se solicitan explícitamente, permitiendo la optimización global del grafo y una asignación eficiente de recursos. Incorpora el volcado de datos consciente de la memoria para evitar fallos del sistema al procesar conjuntos de datos que exceden la memoria disponible, y utiliza la fusión de grafos de tareas para combinar secuencias de operaciones en pasos de ejecución únicos, minimizando la sobrecarga de programación y la comunicación entre nodos. La plataforma proporciona una superficie de capacidades integral para el análisis de datos a gran escala, incluyendo soporte para aprendizaje automático distribuido, integración de computación de alto rendimiento y procesamiento de datos en paralelo. Ofrece herramientas extensas para la gestión del ciclo de vida del clúster, perfilado de rendimiento y monitoreo en tiempo real de la ejecución de tareas. Los usuarios pueden desplegar estos entornos en diversas infraestructuras, incluyendo hardware local, proveedores de nube, sistemas en contenedores y clústeres de computación de alto rendimiento.
Provides a framework for scaling Python workflows from single machines to distributed clusters by orchestrating task graphs.
Meshroom is a node-based photogrammetry software designed to transform collections of two-dimensional images into three-dimensional models and scene geometry. It provides a visual interface for constructing and managing modular data pipelines, allowing users to automate complex computer vision tasks such as feature extraction, depth map estimation, and mesh generation. The software distinguishes itself through a distributed computational framework that dispatches resource-intensive tasks across local hardware or remote render farms. By utilizing a directed acyclic graph execution model, it en
Dispatches resource-intensive reconstruction tasks across local hardware or remote render farms to optimize processing performance.
QuantAxis is a quantitative trading platform and algorithmic trading framework. It provides a comprehensive local environment for backtesting strategies, managing financial market data, and executing trades across stocks, futures, and options markets. The system distinguishes itself through a distributed task scheduler that spreads asynchronous computations and heavy mathematical workloads across a network of remote agents. It incorporates a multi-account trading interface to standardize the monitoring of positions and the execution of orders across various brokerage accounts. The platform c
Distributes asynchronous computational workloads across a local network of remote agents.
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Distributes computational workloads across cloud CPUs and GPUs using ephemeral clusters and spot instances.
Hyperopt is a Python library for hyperparameter optimization designed to minimize scalar-valued objective functions. It operates as a stochastic search space engine that finds optimal input parameters by searching through real-valued, discrete, and conditional spaces. The framework distinguishes itself through its support for complex search space configurations, allowing for conditional parameter hierarchies where specific hyperparameters are sampled only if their parent parameters meet certain criteria. It is built as an asynchronous optimization framework, decoupling the generation of searc
Parallelizes the hyperparameter search process across multiple machines using external clusters or database backends.
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
Executes parallel or distributed computing tasks by initializing frameworks like Spark, Ray, or Dask directly within pipeline steps.
Apache Mesos es un kernel de sistemas distribuidos y gestor de recursos de clúster que abstrae CPU, memoria y almacenamiento a través de un grupo de nodos. Funciona como un orquestador de infraestructura distribuida, proporcionando una capa para ejecutar múltiples frameworks de orquestación en un conjunto compartido de máquinas físicas o virtuales. El sistema actúa como un motor de aislamiento de recursos, dividiendo un clúster compartido en contenedores aislados para ejecutar diversas cargas de trabajo simultáneamente. Permite la orquestación multi-framework, permitiendo que diferentes frameworks de aplicaciones distribuidas compartan una sola infraestructura para maximizar la utilización del hardware. El proyecto cubre la distribución de cómputo a gran escala y la gestión de clústeres distribuidos. Sus capacidades incluyen la gestión de recursos distribuidos y el aislamiento de la potencia de cómputo a través de múltiples aplicaciones para evitar interferencias y garantizar un rendimiento estable en servidores compartidos.
Provides a distributed infrastructure for running multiple computing frameworks across networked machines.
Volcano is a Kubernetes-native batch scheduler specialized for AI, machine learning, and high-performance computing workloads. It provides gang scheduling to atomically allocate resources for all tasks of a distributed job, preventing deadlocks from partial allocation, and supports hierarchical queue management for multi-tenant resource isolation with configurable quotas, borrowing, and preemption. Topology-aware placement optimizes communication-intensive workloads by modeling network hierarchy to minimize cross-switch latency. Volcano differentiates itself with automated orchestration of di
Runs batch jobs from popular data processing, ML, and streaming frameworks without custom integration.
statsforecast es una biblioteca de pronóstico de series temporales estadísticas de alto rendimiento diseñada para generar pronósticos puntuales e intervalos de predicción. Funciona como un framework de series temporales distribuido que utiliza un motor de pronóstico basado en C y un selector de modelos automatizado para identificar y ajustar el modelo estadístico óptimo para cada serie única en un conjunto de datos. El sistema también incluye un detector de anomalías de series temporales para identificar puntos de datos inusuales comparando valores observados con intervalos de pronóstico probabilísticos. El proyecto se distingue por su capacidad para manejar pronósticos paralelos a gran escala para millones de series individuales. Esto se logra a través de un framework de computación distribuida, ejecución paralela multinúcleo y kernels en C compilados que aceleran la lógica central de ARIMA y suavizado exponencial. El sistema optimiza aún más el procesamiento a gran escala utilizando un diseño de datos en formato largo y un pipeline de datos de evaluación perezosa (lazy-evaluation) para reducir la sobrecarga de memoria. La biblioteca proporciona un conjunto completo de modelos, incluyendo AutoARIMA, varios métodos de suavizado exponencial para demanda intermitente o estacional, descomposición Theta y modelado de volatilidad GARCH para riesgo financiero. Cubre áreas de capacidad más amplias como el pronóstico multivariado con variables exógenas, descomposición de series temporales y evaluación de modelos mediante validación cruzada histórica y análisis de ventana deslizante. La biblioteca se integra con estructuras de datos de alto rendimiento como Polars y proporciona utilidades para servir modelos guardados como endpoints REST para predicciones accesibles por red.
Scales forecasting workloads across server clusters using distributed computing and parallel execution.