30 open-source projects similar to veronikayurchuk/pretrained-models.pytorch, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Pretrained Models.pytorch alternative.
This is a PyTorch CNN visualization toolkit designed for neural network interpretability. It provides a set of tools to explain model decisions and analyze the internal behavior of convolutional neural networks through the visualization of activations, gradients, and filters. The project implements specialized techniques for synthesizing representative images, including Deep Dream optimizations to amplify patterns and class-specific image generation via input optimization. It also features a saliency map generator that produces gradient-based heatmaps to identify the specific image regions in
YOLO-World is a vision-language framework and open-vocabulary object detection model. It identifies objects in images and video based on free-form text prompts without requiring predefined category labels. The system enables the identification of arbitrary objects by fusing image features with text embeddings. It includes a specialized tool for automated image labeling, which generates bounding box annotations for custom datasets using text-based prompts. The project provides a deployment pipeline for converting models into quantized ONNX and TFLite formats, supporting real-time inference on
Albumentations is an image augmentation library and computer vision preprocessing tool designed to expand datasets for deep learning models. It provides a collection of transformations that modify pixel values and spatial geometry to increase the diversity of training samples and improve model generalization. The library supports both 2D image augmentation and 3D volumetric data augmentation. It handles a variety of labels alongside images, ensuring that bounding boxes, keypoints, and segmentation masks remain accurately aligned when spatial transformations are applied. The tool incorporates
imgaug is a Python library for machine learning data augmentation and computer vision dataset expansion. It provides tools to increase the volume and variety of training sets by applying random geometric, color, and noise transformations to images. The library ensures spatial consistency by synchronizing transformations across images and their associated annotations, such as bounding boxes, keypoints, and segmentation maps. It uses a compositional pipeline pattern to chain multiple augmentations into sequences and employs deterministic seed management to reproduce specific data samples. The
Darknet is a high-performance C-based inference engine and computer vision library designed for real-time object identification and localization. It serves as a neural network framework for training and deploying detection models using the YOLO architecture, providing a toolset for deep learning training and deployment. The project differentiates itself through a C and CUDA implementation that enables hardware acceleration for matrix multiplication and inference speed optimization. It provides a shared library interface for embedding detection capabilities into external applications and suppo
An all-in-one toolkit for computer vision
Meshroom is a node-based photogrammetry software designed to transform collections of two-dimensional images into three-dimensional models and scene geometry. It provides a visual interface for constructing and managing modular data pipelines, allowing users to automate complex computer vision tasks such as feature extraction, depth map estimation, and mesh generation. The software distinguishes itself through a distributed computational framework that dispatches resource-intensive tasks across local hardware or remote render farms. By utilizing a directed acyclic graph execution model, it en
This is a PyTorch object detection framework that implements the Single Shot MultiBox Detector for identifying and localizing multiple objects within images and video. The project provides a neural network architecture designed for single-shot object detection, which predicts bounding boxes and class labels in one pass. The implementation includes a real-time object detector capable of processing live video streams to track and label objects across sequential frames. It also features a complete computer vision training pipeline for preparing image datasets and training model weights. The fra
FLAME (Fire Luminosity Airborne-based Machine learning Evaluation) Dataset
Clojure wrapper for the Tesseract OCR software
A collection of computer vision pre-trained models.
Fast Neural Style for Image Style Transform by Pytorch
Pixel-wise segmentation on the VOC2012dataset dataset using pytorchpytorch.
DroneAid uses machine learning to detect calls for help on the ground placed by those in need. At the heart of DroneAid is a Symbol Language that is used to train a visual recognition model. That model analyzes video from a drone to detect and count specific images. A dashboard can be used to…
An open-source application for biological image analysis
A computer vision closed-loop learning platform where code can be run interactively online. 学习闭环《计算机视觉实战演练:算法与应用》中文电子书、源码、读者交流社区(持续更新中 ...) 📘 在线电子书 https://charmve.github.io/computer-vision-in-action/ 👇项目主页
DroneNet is Joseph Redmon's YOLO real-time object detection system retrained on 2664 images of DJI drones, labeled. The original and labeled images used for retraining can be found under the image and label folders respectively.
OpenPose is a real-time pose estimation engine designed to detect and track human body, face, hand, and foot landmarks. It functions as a multi-person motion tracker, identifying the spatial coordinates of multiple individuals simultaneously within video streams or static images. Beyond two-dimensional detection, the software acts as a three-dimensional kinematics processor, reconstructing spatial movement data from single or multiple synchronized camera perspectives. The system distinguishes itself through a bottom-up approach that utilizes part-affinity fields to associate body parts across
Albumentations is a computer vision image augmentation library designed to increase training data diversity for deep learning models. It provides a toolset for applying geometric and color transformations to images and annotations, including a specialized collection of 3D operations for volumetric data used in medical and scientific imaging. The library functions as an image mask and bounding box transformer, automatically updating masks, bounding boxes, and keypoints when images undergo geometric changes. This ensures that spatial alterations remain synchronized across images and their assoc