This project is a framework for running Stable Diffusion image generation models on Apple Silicon using Core ML hardware acceleration. It provides a local generative AI pipeline for producing images from text prompts using Swift and Python without relying on external cloud APIs.
The system includes a model converter to transform deep learning checkpoints into Core ML formats and a model optimizer to quantize weights and activations. It features a ControlNet integration layer to guide image generation using external signals such as edge and depth maps.
Capabilities cover text-to-image generation with multilingual text encoding and image safety verification. Performance is managed through weight compression, palettization, and model splitting to fit within hardware memory constraints, while compute planning and quantization are used to reduce prediction latency.
The implementation provides native interfaces for both Python and Swift to integrate generative pipelines into macOS and iOS applications.