Why is unslothai/unsloth a recommended Computational Graph Optimizers GitHub Repositories repository?

Rewrites execution paths at runtime to minimize latency and improve processing speed for complex multimodal operations.

Why is ml-explore/mlx a recommended Computational Graph Optimizers GitHub Repositories repository?

Compiles functions to merge operations and fuse kernels, reducing memory usage and increasing execution speed for complex workflows.

Why is infrasys-ai/aisystem a recommended Computational Graph Optimizers GitHub Repositories repository?

Analyzes compute graphs to determine and insert efficient data layouts for optimized hardware performance.

Why is dask/dask a recommended Computational Graph Optimizers GitHub Repositories repository?

Analyzes and restructures task dependencies to improve execution efficiency and minimize redundant data movement.

Why is fastai/numerical-linear-algebra a recommended Computational Graph Optimizers GitHub Repositories repository?

Accelerates matrix operations through vectorization, parallelization, and just-in-time compilation.

Why is android/ndk-samples a recommended Computational Graph Optimizers GitHub Repositories repository?

Performs parallel data processing using advanced instruction sets to increase execution speed in low-level code.

7 Repos

Awesome GitHub RepositoriesComputational Graph Optimizers

Tools that analyze and rewrite execution paths to improve processing speed and reduce resource usage.

Explore 7 awesome GitHub repositories matching software engineering & architecture · Computational Graph Optimizers. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

unslothai/unsloth
unslothai/unsloth
66,628Auf GitHub ansehen
Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade hardware. The platform distinguishes itself through hand-optimized kernels and automated computational graph techniques that maximize hardware throughput. It supports advanced training methodologies, including reinforcement learning for reasoning and efficient adapter-based fin
Rewrites execution paths at runtime to minimize latency and improve processing speed for complex multimodal operations.
Pythonagentdeepseekdeepseek-r1
Auf GitHub ansehen66,628
ml-explore/mlx
ml-explore/mlx
27,047Auf GitHub ansehen
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
Compiles functions to merge operations and fuse kernels, reducing memory usage and increasing execution speed for complex workflows.
C++mlx
Auf GitHub ansehen27,047
infrasys-ai/aisystem
Infrasys-AI/AISystem
17,017Auf GitHub ansehen
AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs. The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer mo
Analyzes compute graphs to determine and insert efficient data layouts for optimized hardware performance.
Jupyter Notebookaiaiinfraaisys
Auf GitHub ansehen17,017
dask/dask
dask/dask
13,746Auf GitHub ansehen
Dask ist ein Framework für paralleles Rechnen und ein verteilter Task-Scheduler, der darauf ausgelegt ist, Python-Data-Science-Workflows von einzelnen Maschinen auf große Cluster zu skalieren. Es fungiert als Cluster-Ressourcenmanager, der die Berechnungslogik orchestriert, indem Aufgaben und deren Abhängigkeiten als gerichtete azyklische Graphen dargestellt werden. Diese Architektur ermöglicht es dem System, die Verteilung von Workloads auf verfügbare Hardware zu automatisieren und gleichzeitig komplexe Ausführungsanforderungen zu verwalten. Das Projekt zeichnet sich durch eine Lazy-Evaluation-Engine aus, die Datenoperationen verzögert, bis sie explizit angefordert werden, was eine globale Graphoptimierung und effiziente Ressourcenzuweisung ermöglicht. Es integriert speicherbewusstes Data-Spilling, um Systemabstürze bei der Verarbeitung von Datensätzen zu verhindern, die den verfügbaren Speicher überschreiten, und nutzt Task-Graph-Fusion, um Sequenzen von Operationen in einzelne Ausführungsschritte zu kombinieren, wodurch Scheduling-Overhead und Inter-Node-Kommunikation minimiert werden. Die Plattform bietet eine umfassende Oberfläche für die Datenanalyse im großen Maßstab, einschließlich Unterstützung für verteiltes maschinelles Lernen, Integration in das Hochleistungsrechnen und parallele Datenverarbeitung. Sie bietet umfangreiche Werkzeuge für das Cluster-Lebenszyklusmanagement, Performance-Profiling und die Echtzeitüberwachung der Aufgabenausführung. Benutzer können diese Umgebungen über verschiedene Infrastrukturen hinweg bereitstellen, einschließlich lokaler Hardware, Cloud-Anbietern, containerisierten Systemen und Hochleistungsrechner-Clustern.
Analyzes and restructures task dependencies to improve execution efficiency and minimize redundant data movement.
Pythondasknumpypandas
Auf GitHub ansehen13,746
fastai/numerical-linear-algebra
fastai/numerical-linear-algebra
10,703Auf GitHub ansehen
This project is a comprehensive library for numerical linear algebra and scientific computing, designed to provide optimized routines for matrix decomposition, statistical modeling, and high-performance data analysis. It serves as both a toolkit for solving complex linear systems and an educational resource for understanding the fundamental algorithms behind matrix factorizations and numerical solvers. The library distinguishes itself through a focus on randomized numerical linear algebra, utilizing probabilistic algorithms and approximate methods to perform dimensionality reduction and matri
Accelerates matrix operations through vectorization, parallelization, and just-in-time compilation.
Jupyter Notebookalgorithmsdata-sciencedeep-learning
Auf GitHub ansehen10,703
android/ndk-samples
android/ndk-samples
10,513Auf GitHub ansehen
The Android NDK samples provide a comprehensive collection of code examples demonstrating how to integrate C and C++ native code into Android applications. This repository serves as a practical guide for developers utilizing the Android Native Development Kit to implement performance-critical application components that require direct hardware access and low-level system interaction. The project highlights the use of the Java Native Interface to bridge managed code with native modules, enabling cross-language function calls and efficient data exchange. It demonstrates how to manage native act
Performs parallel data processing using advanced instruction sets to increase execution speed in low-level code.
C++
Auf GitHub ansehen10,513
openmlsys/openmlsys
openmlsys/openmlsys
4,813Auf GitHub ansehen
Dieses Projekt ist eine umfassende Bildungsressource und ein Lehrplan, der sich auf das Design und die Implementierung des gesamten Machine-Learning-Software- und Hardware-Stacks konzentriert. Es dient als technische Referenz für die Architektur von Machine-Learning-Systemen, die von Low-Level-Programmierschnittstellen bis hin zur Deployment-Infrastruktur im großen Maßstab reicht. Das Projekt bietet instruktive Anleitungen zu mehreren spezialisierten Bereichen, einschließlich der Entwicklung von KI-Compilern durch Zwischenrepräsentationen und Graph-Optimierungen. Es deckt die Architekturmuster ab, die für verteiltes Training über GPU-Cluster hinweg erforderlich sind, sowie die Programmierung von Hardware-Beschleunigern zur Optimierung von Workloads auf spezialisierten Chips. Die Ressource beschreibt zudem die Implementierung von Modell-Serving-Frameworks für Produktionsumgebungen und das Design von Reinforcement-Learning-Pipelines. Ihr Umfang erstreckt sich auf die Kernkomponenten von ML-Systemen, wie automatische Differenzierung, Tensor-Abstraktionen und die Orchestrierung von GPU-Ressourcen.
Analyzes and rewrites execution paths to improve processing speed and reduce resource usage in compute graphs.
TeXcomputer-systemsmachine-learningsoftware-architecture
Auf GitHub ansehen4,813