ANE

ANE is an open-source framework for training neural networks directly on Apple's Neural Engine hardware, bypassing Apple's public Core ML toolchain through reverse-engineered private APIs. It provides low-level control over the ANE, enabling developers to compile custom compute graphs into binary kernels, partition transformer model layers into hardware-compatible subgraphs, and share GPU-allocated memory with the Neural Engine via zero-copy IOSurface buffers.

The framework distinguishes itself by offering direct access to hardware performance counters and power telemetry for benchmarking throughput and energy consumption, alongside a quantization pass that converts weights and activations to INT8 precision for reduced memory bandwidth. It also includes a checkpoint-based compile bypass that serialises compiled kernel state to disk, allowing training to resume without recompiling and sidestepping hardware compile-time limits.

ANE provides tools for measuring throughput and power consumption of custom compute graphs, quantizing model weights to INT8, and training transformer models end-to-end on the Neural Engine. The project's documentation covers installation and usage of these capabilities through its reverse-engineered API bindings.

Features

Neural Engine Training Toolkits - Trains transformer and other neural network models directly on Apple Neural Engine hardware using low-level APIs.

Apple Neural Engine Transformer Training - Enables transformer model training to run directly on Apple's Neural Engine hardware via low-level APIs.

Neural Engine Training Engines - Runs transformer model training directly on neural engine hardware using low-level APIs.

Hardware-Aware Graph Partitioning - Splits transformer model layers into ANE-compatible subgraphs respecting hardware memory and instruction constraints.

8-Bit Inference Quantizers - Quantizes model weights and activations to 8-bit integers to reduce memory bandwidth on ANE hardware.

Graph-Level INT8 Quantization Passes - Inserts quantization and dequantization operations into compute graphs for INT8 precision on ANE.

Apple Neural Engine Training Frameworks - An open-source framework for training neural networks directly on Apple's Neural Engine using reverse-engineered APIs.

Checkpoint Resume - Restarts training from a saved checkpoint to bypass compile limits on specialized hardware.

Reverse-Engineered API Clients - Wraps undocumented Apple Neural Engine kernel APIs through manual reverse engineering to expose low-level hardware control.

Training Checkpointing - Restarts training from a saved checkpoint to bypass compile limits on specialized hardware.

Zero-Copy GPU Buffer Interop - Shares GPU-allocated buffers directly with the Neural Engine via IOSurface to eliminate data copying.

Hardware Performance Counter Integrations - Reads ANE performance counters and power telemetry directly to benchmark throughput and energy consumption.

Neural Engine Kernel Compilations - Compiles custom compute graphs into ANE-specific binary kernels using reverse-engineered private compiler services.

Checkpoint-Based Compile Bypasses - Provides a checkpoint-based compile bypass that sidesteps hardware compile-time limits for ANE training.

Neural Network Operation Benchmarking - Measures throughput and power consumption of custom compute graphs running on Apple Neural Engine hardware.

Neural Engine Quantization Tools - Quantizes model weights and activations to INT8 precision for reduced memory bandwidth on Apple hardware.

Neural Engine Benchmarking Tools - Measures throughput and power consumption of custom compute graphs executed on the Apple Neural Engine.

maderixANE

Features

Star history