30 open-source projects similar to microsoft/computervision-recipes, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Computervision Recipes alternative.
Gluon-CV is an MXNet computer vision library that provides a comprehensive collection of pre-implemented vision architectures and training pipelines. It serves as a deep learning research toolkit and a model zoo containing state-of-the-art pre-trained weights for image and video analysis. The project includes a specialized human pose estimation library and a model compression toolkit. These tools allow for the pruning and quantization of deep learning models to increase inference speed and facilitate deployment on constrained edge hardware. The library covers a broad range of vision capabili
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
PaddleDetection is an object detection framework designed for the end-to-end development, training, and deployment of computer vision models. It provides a comprehensive library of modular neural network architectures and pipelines that support object detection, instance segmentation, and multi-object tracking tasks. The project distinguishes itself through a configuration-driven approach that decouples model components like backbones and heads, allowing for the flexible assembly of custom vision workflows. It incorporates advanced techniques such as anchor-free detection logic, joint detecti
Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification. By utilizing a modular architecture, the platform allows users to swap model components to balance inference speed and accuracy requirements for diverse applications. The framework distinguishes itself through its support for real-time processing and flexible deployment. It in
ImageAI is a Python computer vision library providing a suite of tools for image classification, object detection, and video analytics. It functions as an integrated framework for locating and labeling objects in static images and video streams, utilizing deep learning models for identification and categorization. The project includes a model training toolkit that allows for the creation of custom classifiers and detectors through scratch training or transfer learning. It features a GPU-accelerated inference engine to increase processing speed for vision tasks and includes specialized utiliti
RF-DETR is a Python library for training and deploying object detection, instance segmentation, and keypoint detection models built on a vision transformer architecture. It provides a unified command-line interface and Python API for the full workflow, from fine-tuning pretrained checkpoints on custom datasets to running inference on images, video files, and live camera streams. The project supports training on datasets in COCO or YOLO format, with automatic format detection and configurable augmentation pipelines. Models can be exported to ONNX, TFLite, or TensorRT for deployment across edge
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Detectron2 is a PyTorch computer vision framework and visual recognition platform designed for training and deploying models for object detection, image segmentation, and visual recognition. It provides a research-oriented environment for training complex vision models with multi-GPU acceleration. The project includes a specialized object detection library for identifying and locating multiple objects via bounding boxes, as well as an image segmentation toolkit for creating pixel-level masks through instance, semantic, and panoptic segmentation. Additionally, it features a human pose estimati
This project is a PyTorch-based computer vision library and deep learning image processing framework. It provides a collection of neural network architectures designed for visual analysis tasks, specifically focusing on image classification, object detection, and semantic segmentation. The toolset implements diverse methodologies for visual recognition, including anchor-free object detection, regional proposal networks, and heatmap-based keypoint estimation. It utilizes both convolutional neural networks for spatial feature extraction and transformer-based self-attention mechanisms to compute
This project is a collection of educational Jupyter Notebooks providing tutorials on neural network construction and tensor operations using the TensorFlow framework. It serves as a machine learning educational repository and implementation guide for deep learning students. The suite focuses on specific advanced architectures, including convolutional networks for image classification, residual networks with skip connections for training stability, and variational autoencoders for generative modeling and data synthesis. It also includes guides for building denoising and deep autoencoders to pe
Corenet is a deep learning training framework and computer vision model library designed for developing neural networks across vision, text, and audio modalities. It functions as a distributed training orchestrator for scaling workloads across multiple compute nodes and provides a multimodal data pipeline for processing image, text, and video data. The project includes a model conversion toolkit for transforming weights and architectures between different machine learning frameworks. It also provides tools for optimizing model performance on Apple Silicon and reducing response latency in gene
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
Swin-Transformer is a deep learning framework designed for training and deploying hierarchical vision transformer models. It serves as a research library and toolkit for computer vision tasks, providing the infrastructure to build models that replace standard convolution operations with sliding window self-attention mechanisms. By utilizing a multi-scale feature hierarchy, the framework enables the processing of visual data at varying resolutions and spatial scales. The project distinguishes itself through its implementation of shifted window partitioning, which facilitates global information
This project is a modular PyTorch framework for training and evaluating object detection and instance segmentation models. It serves as a computer vision research tool and a deep learning inference engine designed to identify object locations, classes, and pixel-level masks within images. The framework implements a two-stage inference pipeline that utilizes region proposal networks and a symmetric mask-head architecture. It provides specialized capabilities for instance segmentation, object bounding box detection, and human pose estimation via anatomical keypoint detection. The system includ
This is a structured deep learning curriculum for programmers, delivered as a collection of Jupyter notebooks. It teaches the fundamentals of training neural networks for computer vision, natural language processing, tabular data analysis, and collaborative filtering using PyTorch and the fastai library. The course is designed to be hands-on, guiding learners from building a training loop from scratch to fine-tuning pretrained models for a variety of practical tasks. The curriculum distinguishes itself by covering the full lifecycle of a deep learning project, from data preparation and augmen
This project provides a deep residual network framework and pre-trained PyTorch models designed for high-accuracy image recognition. It implements a neural network architecture that utilizes skip connections to enable the training of very deep models without gradient degradation. The system is designed for computer vision tasks, including image classification, object detection, and visual data segmentation. It includes weights trained on ImageNet to support transfer learning and the fine-tuning of models on custom image datasets. The architectural design focuses on residual learning blocks,
This project is a self-supervised vision foundation model based on a vision transformer architecture. It is designed to learn dense visual representations from unlabeled images, serving as a general-purpose backbone for a wide variety of downstream vision tasks. The system is distinguished by its use of self-distillation and masked image modeling to extract semantic and geometric features. It also incorporates an image-text alignment model that maps visual embeddings to textual descriptions, enabling zero-shot image recognition, zero-shot segmentation, and cross-modal retrieval. The project
This is a real-time object detection framework built on the YOLOv3 architecture, implemented in PyTorch. It provides a complete pipeline for identifying and localizing objects in images and video using a single neural network pass, combining a Darknet-53 backbone with multi-scale feature pyramids and anchor-based bounding box prediction. The framework extends beyond basic detection to include instance segmentation, human pose estimation, and multi-object tracking across video frames. It offers a model export toolkit that converts trained models through ONNX to CoreML, TensorFlow Lite, and Ten
This project is a comprehensive framework and toolkit for developing, optimizing, and deploying transformer-based models across multimodal, document intelligence, and natural language processing tasks. It provides a unified neural architecture that processes text, vision, audio, and document layout data through a shared set of weights, enabling researchers and developers to build foundational models that align cross-modal representations. The platform distinguishes itself through advanced training and inference strategies designed for large-scale deep learning. It incorporates specialized mec
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
This repository provides a collection of reference implementations and code examples for training and deploying machine learning models using the MLX framework. It serves as a practical guide for executing distributed training, fine-tuning large language models, converting model weights, and implementing multimodal generative workflows. The project distinguishes itself through specialized examples for local hardware execution, featuring weight quantization to reduce memory usage and low-rank adaptation for parameter-efficient fine-tuning. It also includes scripts for transforming external mod
InternVL is a vision-language model framework that fuses a visual encoder with a large language model to translate image features into textual tokens for reasoning. It provides a system for multimodal inference and dialogue, enabling the processing of images and text to answer questions or generate descriptions. The project is distinguished by its high-resolution image processing, which uses dynamic tiling to maintain detail for images up to 4K resolution, and its chain-of-thought visual reasoning for solving complex mathematical and spatial problems. It also supports temporal frame sampling
PaddleClas is a toolkit for image classification and recognition built on PaddlePaddle. It provides a suite of tools for training deep learning models and a framework for implementing visual search and retrieval systems. The project includes a computer vision model optimization suite and tools for cross-platform deployment. It enables the export of trained models to servers, mobile devices, and edge hardware to achieve high-performance inference across different programming languages. The toolkit covers model compression and optimization through pruning, quantization, and knowledge distillat
This is an open-source autonomous driving perception pipeline that processes camera and lidar sensor data to detect, track, and fuse objects in real-world driving environments. The project integrates an end-to-end perception workflow combining sensor calibration, deep learning object detection, Kalman filter tracking, and sensor fusion for robust scene understanding. The pipeline includes camera calibration tools to remove lens distortion from raw images, deep learning model training for object classification and detection, and multi-object tracking using Kalman filters with data association
ccv is a computer vision library written in C designed for high-performance visual analysis. It serves as a framework for image classification, object detection, and the identification of faces, pedestrians, and vehicles. The library distinguishes itself through hardware-accelerated vision and deep learning inference optimizations. It utilizes a quantized tensor processor to transform floating-point data into eight-bit integers and implements integer-quantized attention mechanisms to reduce memory bandwidth and increase data throughput. The project covers a broad range of capabilities, inclu
YOLOv7 is a PyTorch vision library and real-time inference engine designed for object detection, human pose estimation, and instance segmentation. It provides a framework for detecting and locating multiple objects within images or video streams using neural networks. The system includes tools for custom model training and fine-tuning, allowing pre-trained weights to be adapted to specialized datasets via transfer learning. It also supports model weight export and format conversion to facilitate deployment on production servers and embedded edge devices.
nnU-Net is a PyTorch-based deep learning framework for the supervised semantic segmentation of 2D and 3D biomedical images. It functions as an automated medical imaging pipeline that generates predicted masks and labels from clinical images. The system distinguishes itself by using dataset-driven auto-configuration to automatically select the optimal network architecture, preprocessing steps, and training hyperparameters based on the specific properties of the input medical dataset. The framework covers a broad range of capabilities including medical dataset preparation, intensity normalizat
YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning to high-speed inference and deployment. The framework utilizes a modular neural architecture, allowing users to swap backbone and head components to tailor models for specific visual tasks. What distinguishes this project is its focus on production-ready deployment and model ef
This is a collection of tutorials and practical demonstrations for implementing machine learning tasks using the HuggingFace Transformers library. It serves as a guide for applying transformer architectures across computer vision, natural language processing, and audio analysis. The repository provides implementation examples for multimodal model deployment, including the combination of text, image, and audio inputs. It includes resources for optimizing pre-trained models through fine-tuning on custom datasets and provides examples for preparing PyTorch datasets by converting raw files into t
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc