Corenet is a deep learning training framework and computer vision model library designed for developing neural networks across vision, text, and audio modalities. It functions as a distributed training orchestrator for scaling workloads across multiple compute nodes and provides a multimodal data pipeline for processing image, text, and video data.
The project includes a model conversion toolkit for transforming weights and architectures between different machine learning frameworks. It also provides tools for optimizing model performance on Apple Silicon and reducing response latency in generative models.
The framework covers a broad range of capabilities, including visual recognition tasks such as object detection, semantic segmentation, and image classification. It supports advanced training techniques such as parameter-efficient fine-tuning, contrastive language-image pre-training, and structural reparameterization.
Training and evaluation pipelines are managed through YAML-based configuration files and recipes to ensure reproducibility across environments.