30 open-source projects similar to longcw/yolo2-pytorch, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Yolo2 Pytorch alternative.
This is a PyTorch object detection framework that implements the Single Shot MultiBox Detector for identifying and localizing multiple objects within images and video. The project provides a neural network architecture designed for single-shot object detection, which predicts bounding boxes and class labels in one pass. The implementation includes a real-time object detector capable of processing live video streams to track and label objects across sequential frames. It also features a complete computer vision training pipeline for preparing image datasets and training model weights. The fra
(Disclaimer: this is work in progress and does not feature all the functionalities of detectron. Currently only inference and evaluation are supported -- no training) (News: Now supporting FPN and ResNet-101!)
PyTorch module to use OpenFace's nn4.small2.v1.t7 model
This is a PyTorch-based computer vision library for detecting 2D and 3D facial landmark coordinates. It functions as a facial landmark detector and reconstruction tool, utilizing deep learning to identify precise geometric points on human faces from image datasets. The library allows for the selection of specific detection backends to balance accuracy and processing speed. It supports the integration of precomputed bounding box files, which enables the system to bypass the initial detection phase and proceed directly to landmark extraction. The toolkit includes capabilities for batch image p
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
Pixel-wise segmentation on the VOC2012dataset dataset using pytorchpytorch.
Neural Style and MSG-Net
This project is a modular PyTorch framework for training and evaluating object detection and instance segmentation models. It serves as a computer vision research tool and a deep learning inference engine designed to identify object locations, classes, and pixel-level masks within images. The framework implements a two-stage inference pipeline that utilizes region proposal networks and a symmetric mask-head architecture. It provides specialized capabilities for instance segmentation, object bounding box detection, and human pose estimation via anatomical keypoint detection. The system includ
This project is a library of pretrained computer vision architectures and backbones for image classification and feature extraction. It serves as a comprehensive model zoo and collection of standardized image encoders, including ResNet, Vision Transformers, and EfficientNet, for use in visual analysis and as backbones for object detection and image segmentation. The library provides a framework for distributed training and evaluation of image models using advanced data augmentation and optimization scripts. It includes a dedicated toolset for converting trained PyTorch vision models into the
About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloader), MDSR functions are temporarily disabled. If you have to train/evaluate the MDSR model, please use legacy branches.
A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks, https://arxiv.org/abs/1610.02915)
Detectron2 is a PyTorch computer vision framework and visual recognition platform designed for training and deploying models for object detection, image segmentation, and visual recognition. It provides a research-oriented environment for training complex vision models with multi-GPU acceleration. The project includes a specialized object detection library for identifying and locating multiple objects via bounding boxes, as well as an image segmentation toolkit for creating pixel-level masks through instance, semantic, and panoptic segmentation. Additionally, it features a human pose estimati
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.
PyTorch implementation of Deformable Convolution
PyTorch implementation of Fader Networks (NIPS 2017).
Darknet is a high-performance C-based inference engine and computer vision library designed for real-time object identification and localization. It serves as a neural network framework for training and deploying detection models using the YOLO architecture, providing a toolset for deep learning training and deployment. The project differentiates itself through a C and CUDA implementation that enables hardware acceleration for matrix multiplication and inference speed optimization. It provides a shared library interface for embedding detection capabilities into external applications and suppo
YOLO-World is a vision-language framework and open-vocabulary object detection model. It identifies objects in images and video based on free-form text prompts without requiring predefined category labels. The system enables the identification of arbitrary objects by fusing image features with text embeddings. It includes a specialized tool for automated image labeling, which generates bounding box annotations for custom datasets using text-based prompts. The project provides a deployment pipeline for converting models into quantized ONNX and TFLite formats, supporting real-time inference on
OpenPose is a real-time pose estimation engine designed to detect and track human body, face, hand, and foot landmarks. It functions as a multi-person motion tracker, identifying the spatial coordinates of multiple individuals simultaneously within video streams or static images. Beyond two-dimensional detection, the software acts as a three-dimensional kinematics processor, reconstructing spatial movement data from single or multiple synchronized camera perspectives. The system distinguishes itself through a bottom-up approach that utilizes part-affinity fields to associate body parts across
pytorch implementation of fast-neural-style
Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)
This project is an unsupervised image restoration tool that uses a convolutional neural network as a structural prior to reconstruct images from noisy or incomplete data. It functions as a neural network image prior, utilizing the inherent biases of the network architecture to restore pixels without the need for a pre-trained dataset or external learning. The system performs zero-shot image restoration by treating the network architecture itself as a regularization term. It uses a randomly initialized encoder-decoder structure and iterative gradient descent to minimize pixel-wise loss, recove
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)