9 dépôts
Techniques for executing tasks across multiple processing units or nodes.
Distinguishing note: Focuses on parallel execution and hardware utilization in a distributed context.
Explore 9 awesome GitHub repositories matching devops & infrastructure · Distributed Computing. Refine with filters or upvote what's useful.
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Executes data transfers concurrently with computations to maximize bus bandwidth and reduce total execution time.
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
Shares processing loads across multiple physical machines using communication backends.
Meshroom is a node-based photogrammetry software designed to transform collections of two-dimensional images into three-dimensional models and scene geometry. It provides a visual interface for constructing and managing modular data pipelines, allowing users to automate complex computer vision tasks such as feature extraction, depth map estimation, and mesh generation. The software distinguishes itself through a distributed computational framework that dispatches resource-intensive tasks across local hardware or remote render farms. By utilizing a directed acyclic graph execution model, it en
Executes processing pipelines across local or remote hardware while managing node locking and resource monitoring for parallel tasks.
Modin is a distributed dataframe library and parallel data processing engine designed to handle large datasets that exceed system memory. It functions as a distributed computing framework that parallelizes data manipulation tasks across multiple CPU cores or clusters to increase throughput and avoid memory errors. The project mirrors the Pandas API, allowing for the distribution of data workflows without changing core code logic. It utilizes a pluggable backend interface, which enables users to switch between different distributed execution engines to optimize performance based on available h
Manages the execution of data tasks across various backends to optimize performance based on hardware.
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
Implements distributed computing strategies to parallelize workloads across CPUs, GPUs, and TPUs.
PowerInfer is a high-performance local large language model inference engine and sparse inference framework. It provides a runtime for executing models on consumer-grade hardware, utilizing a GPU acceleration backend to optimize tensor operations for graphics processors. The system distinguishes itself through a sparse inference framework that increases generation speed by skipping computations based on activation sparsity in model weights. It includes a GGUF model converter for transforming weights and metadata into a unified binary format, as well as an OpenAI API compatible server for inte
Splits the compute graph into segments and distributes them across multiple nodes to parallelize model execution.
Featuretools is a Python data science library and automated feature engineering framework designed to create predictive features from multiple related datasets. It automates the data preparation and transformation steps required for machine learning models through deep feature synthesis. The library enables the automatic generation of comprehensive feature tables by applying recursive transformations to relational data. It supports the transformation of unstructured text into structured numeric features and allows users to define custom primitives to extend the synthesis process with specific
Distributes the recursive feature synthesis process across multiple cores or clusters for efficient large-scale processing.
Ce projet est un framework de transformer basé sur JAX et un entraîneur de modèles de langage large conçu pour construire et entraîner des modèles distribués sur des accélérateurs matériels TPU. Il fournit un système pour le pré-entraînement et le fine-tuning de modèles autorégressifs en divisant les poids et les calculs sur un maillage de périphériques pour réduire la surcharge mémoire et augmenter la vitesse de traitement. Le framework inclut un orchestrateur de calcul TPU pour provisionner les ressources et automatiser l'installation des dépendances sur des nœuds distribués distants. Il dispose également d'un convertisseur de poids de modèle capable de transformer et de re-sharder les checkpoints entre différentes configurations matérielles et précisions numériques. Le projet couvre des capacités plus larges, notamment la gestion de checkpoints shardés pour le stockage cloud, le chargement de données par flux avec restauration d'état et la génération de texte basée sur le noyau pour l'inférence. Il prend en charge l'accélération matérielle compilée XLA pour les clusters TPU et GPU et fournit des outils de benchmarking de performance sur des tâches linguistiques standardisées.
Automates dependency installation and cluster initialization on remote nodes for distributed execution.
Ce projet est un framework d'alignement et une suite de pipelines pour l'entraînement de modèles de langage via le fine-tuning supervisé et l'optimisation par préférence. Il fournit des outils pour exécuter un entraînement distribué à grande échelle sur plusieurs GPU et nœuds de calcul, ainsi qu'un système pour mesurer l'utilité du modèle et la qualité des dialogues via des benchmarks mono-tour et multi-tours. Le framework inclut des outils spécialisés pour l'optimisation directe par préférence (DPO) afin d'affiner le comportement du modèle en utilisant des données appariées sans nécessiter de modèle de récompense séparé. Il prend également en charge l'alignement par IA constitutionnelle et l'entraînement de modèles de récompense pour classer et noter les réponses selon des critères de préférence. Le projet couvre des capacités plus larges pour le mélange de jeux de données, le fine-tuning efficace en paramètres via l'adaptation de bas rang (LoRA), et l'optimisation par échantillonnage par rejet. Il gère le cycle de vie de l'entraînement via des recettes basées sur la configuration et fournit des systèmes pour diffuser des métriques de performance en temps réel vers des tableaux de bord externes.
Coordinates large-scale model alignment tasks across multiple GPUs and compute nodes.