zml is a machine learning model compiler and cross-platform inference engine that transforms model descriptions into optimized executable binaries for specific hardware accelerators. It functions as a model deployment toolkit and hardware-agnostic orchestrator, utilizing a tensor-based architecture definition to provide strong type checking during the compilation process.
The project distinguishes itself through the ability to shard tensors and distribute large-scale AI workloads across a logical mesh of multiple devices. It further supports the remote model lifecycle by authenticating and downloading gated model weights from cloud repositories to integrate them into a local inference engine.
The toolkit covers a broad range of capabilities, including model architecture validation against reference activations, cross-platform binary compilation for various operating systems, and the packaging of compiled models into container images and archives for server deployment. It also provides mechanisms for tensor buffer management and the porting of models from other formats.