30 open-source projects similar to zylo117/yet-another-efficientdet-pytorch, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Yet Another EfficientDet Pytorch alternative.
This project is a PyTorch implementation of the YOLOv3 object detection architecture. It functions as a real-time object detector and computer vision framework designed to identify and locate multiple objects within images using bounding boxes and class labels. The system allows for both the use of pretrained weights for immediate image analysis and the training of custom models using datasets with bounding box annotations. It provides a programmatic interface to integrate detection capabilities directly into other software applications. The framework includes tools for model evaluation to m
This is a PyTorch object detection framework that implements the Single Shot MultiBox Detector for identifying and localizing multiple objects within images and video. The project provides a neural network architecture designed for single-shot object detection, which predicts bounding boxes and class labels in one pass. The implementation includes a real-time object detector capable of processing live video streams to track and label objects across sequential frames. It also features a complete computer vision training pipeline for preparing image datasets and training model weights. The fra
PaddleDetection is an object detection framework designed for the end-to-end development, training, and deployment of computer vision models. It provides a comprehensive library of modular neural network architectures and pipelines that support object detection, instance segmentation, and multi-object tracking tasks. The project distinguishes itself through a configuration-driven approach that decouples model components like backbones and heads, allowing for the flexible assembly of custom vision workflows. It incorporates advanced techniques such as anchor-free detection logic, joint detecti
YOLOv9 is a real-time computer vision framework and deep learning model designed for image classification, object detection, and instance segmentation. It functions as both a vision model and a trainer, allowing for the optimization of neural network weights on custom datasets using single or multiple GPUs. The framework utilizes programmable gradient information to perform high-speed identification and location of multiple objects within images and video streams. It extends beyond bounding box detection to provide instance segmentation and panoptic segmentation, which labels every pixel in a
This is a real-time object detection framework built on the YOLOv3 architecture, implemented in PyTorch. It provides a complete pipeline for identifying and localizing objects in images and video using a single neural network pass, combining a Darknet-53 backbone with multi-scale feature pyramids and anchor-based bounding box prediction. The framework extends beyond basic detection to include instance segmentation, human pose estimation, and multi-object tracking across video frames. It offers a model export toolkit that converts trained models through ONNX to CoreML, TensorFlow Lite, and Ten
YOLOv7 is a PyTorch vision library and real-time inference engine designed for object detection, human pose estimation, and instance segmentation. It provides a framework for detecting and locating multiple objects within images or video streams using neural networks. The system includes tools for custom model training and fine-tuning, allowing pre-trained weights to be adapted to specialized datasets via transfer learning. It also supports model weight export and format conversion to facilitate deployment on production servers and embedded edge devices.
RT-DETR is a real-time object detection model based on the detection transformer architecture. It is implemented as a computer vision model for both the PyTorch and PaddlePaddle deep learning platforms, designed to identify and locate multiple objects in images and video streams. The model eliminates the need for anchor generation and non-maximum suppression by utilizing a transformer-based approach. It focuses on high-performance detection, balancing precision and low latency for live environment deployment. The system employs a hybrid encoder and multi-scale feature fusion to extract globa
This project is a PyTorch implementation of the YOLOv4 object detection framework. It provides a system for training and deploying neural networks that identify and locate multiple objects within images and video streams. The framework includes tools for converting trained weights into universal formats and hardware-specific optimized engines, specifically supporting ONNX and TensorRT. It features a TensorRT inference optimizer to reduce latency and increase throughput, as well as a model architecture compatible with NVIDIA DeepStream streaming analytics pipelines. The system covers model tr
Darknet is a high-performance C-based inference engine and computer vision library designed for real-time object identification and localization. It serves as a neural network framework for training and deploying detection models using the YOLO architecture, providing a toolset for deep learning training and deployment. The project differentiates itself through a C and CUDA implementation that enables hardware acceleration for matrix multiplication and inference speed optimization. It provides a shared library interface for embedding detection capabilities into external applications and suppo
YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning to high-speed inference and deployment. The framework utilizes a modular neural architecture, allowing users to swap backbone and head components to tailor models for specific visual tasks. What distinguishes this project is its focus on production-ready deployment and model ef
NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
RF-DETR is a Python library for training and deploying object detection, instance segmentation, and keypoint detection models built on a vision transformer architecture. It provides a unified command-line interface and Python API for the full workflow, from fine-tuning pretrained checkpoints on custom datasets to running inference on images, video files, and live camera streams. The project supports training on datasets in COCO or YOLO format, with automatic format detection and configurable augmentation pipelines. Models can be exported to ONNX, TFLite, or TensorRT for deployment across edge
YOLOv10 is a PyTorch computer vision library and real-time vision framework designed for locating and identifying multiple objects in images and video streams. It functions as an end-to-end object detector that optimizes for high-speed deployment and detection precision. The project is distinguished by an NMS-free detection architecture that predicts a single bounding box per object, eliminating the need for non-maximum suppression post-processing to reduce inference latency. It further optimizes for edge hardware through scalable weights and a quantization-friendly structure that facilitates
This project is an object detection framework implementing the YOLOv3 architecture using Keras and TensorFlow. It functions as a deep learning vision model and computer vision toolset designed to locate and classify multiple entities within images and video streams using bounding boxes. The system includes a multi-GPU inference engine to distribute computational loads across several graphics processing units. It also provides a pipeline for creating custom object detectors by retraining pre-trained weights on annotated datasets to recognize user-defined object classes. The framework covers m
This project is a TensorFlow and Keras implementation of the Mask R-CNN architecture. It provides a framework for performing simultaneous object detection and instance segmentation, transforming raw images into segmented masks and bounding boxes for individual object identification. The toolset enables custom computer vision training through fine-tuning pre-trained weights and integrating user-provided datasets. It includes capabilities for distributed GPU training to accelerate the optimization of large vision models. The framework covers model evaluation using standard precision metrics an
This project is a deep learning curriculum and a collection of PyTorch tutorials designed for deep learning education. It provides a structured set of technical documents and runnable notebooks that translate theoretical machine learning concepts into executable code. The repository includes implementation guides for various neural network architectures, specifically covering convolutional, recurrent, and transformer-based models. It provides practical examples for building computer vision pipelines for object detection and semantic segmentation, as well as natural language processing tools f
Kaolin is a PyTorch 3D deep learning library providing a comprehensive suite of tools for 3D geometry processing, physics simulation, data visualization, and gradient-based rendering for computer vision. The library includes a differentiable 3D renderer and a geometry processing toolkit for converting and transforming 3D representations such as meshes and point clouds. It also features a 3D physics simulation engine to calculate physical interactions and collisions between three-dimensional objects and scenes. The toolkit provides utilities for 3D data visualization, including the creation o
This project is a PyTorch person re-identification framework designed for training and evaluating models that identify individuals across different camera views. It provides a complete model training pipeline, a deep learning feature extractor for converting images into numeric vectors, and a suite of computer vision benchmarking tools to measure identity retrieval accuracy. The framework includes a specialized transfer learning toolkit that supports layer freezing, staged learning rate optimization, and differential learning rates for fine-tuning pretrained models. It distinguishes itself th
This project is a PyTorch object detection framework that implements the Faster R-CNN architecture. It serves as a vision model for predicting precise bounding boxes around multiple objects within images and live video feeds. The system is optimized for multi-GPU training to reduce the time required for model convergence. It utilizes a GPU-accelerated design to handle the training and inference of complex detection networks. The framework covers the full object detection lifecycle, including custom network training and inference for static images and real-time video streams. It includes capa
Deformable-DETR is an object detection system for computer vision that uses a transformer-based encoder-decoder architecture. It identifies and locates objects within images by representing potential targets as a set of learnable queries. The project employs sampling-based attention to restrict attention to a small set of points around a reference, reducing computational complexity and speeding up convergence. It further utilizes multi-scale feature fusion to detect objects of varying sizes within a single frame. The system includes capabilities for training models across multiple GPU cluste
This project is a PyTorch implementation of a research architecture designed for high-resolution representation learning. It serves as a computer vision framework focused on precise keypoint detection, human pose estimation, and semantic image segmentation. The implementation provides specialized tools for identifying anatomical landmarks on the human body and predicting facial keypoint coordinates to analyze orientation and alignment. It utilizes a system of multi-resolution parallel streams and repeated multi-scale fusion to maintain high-resolution representations throughout the network.
This project provides a suite of lightweight face detection models designed for high-speed inference on edge computing devices. It centers on a compact neural network architecture that enables human face detection within environments characterized by limited compute resources and power constraints. The system features quantized face detectors available in multiple formats to ensure compatibility across diverse hardware architectures. It includes utilities for model export and quantization, allowing trained weights to be converted into standardized formats for hardware-agnostic deployment. Th
DenseNet is a computer vision model and convolutional neural network implementation designed for image recognition and classification tasks. It utilizes a densely connected network architecture where each layer is connected to every other layer to improve feature propagation. The implementation reduces the number of parameters while maintaining accuracy through a dense-connectivity pattern and layer-aggregation concatenation. It supports model construction using both standard and bottleneck-compressed architectures, with configurable network depth and growth rates to balance inference time an
This is a PyTorch implementation of EfficientNet convolutional neural networks. It serves as a computer vision model library providing architectures for image classification and high-level feature extraction, including pre-trained weights for immediate image categorization. The library supports transfer learning by allowing the modification of model architectures and output layers to accommodate a custom number of classes for new datasets. It also includes a model exporter to convert trained PyTorch weights into the ONNX format for production inference. The system covers broader computer vis
This project is a pretrained model library for PyTorch, providing a collection of convolutional neural network architectures and weights. It serves as a computer vision model zoo for image classification and feature extraction, offering a framework for transfer learning where pretrained networks are adapted for custom image recognition tasks. The library focuses on transforming images into high-level numerical representations and calculating class probability scores. It includes utilities for downloading and initializing standard architectures such as ResNet, Inception, and Xception. Capabil
DeepLabCut is a deep learning toolkit for markerless 2D and 3D animal pose estimation. It functions as a motion tracking system that identifies anatomical keypoints on animals in video sequences without the need for physical markers. The framework utilizes transfer learning and a library of pre-trained weights to accelerate the training of networks for different species. It supports multi-individual identity tracking to maintain unique identities across video sequences and offers real-time pose detection for live video feeds. The system covers a broad range of computer vision capabilities, i
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
DAIN is a video frame synthesis engine and AI video upsampling tool designed to increase video playback smoothness. It functions as a computer vision model that synthesizes intermediate frames between existing images to transform low frame rate video into high frame rate content. The system utilizes depth-aware video frame interpolation to predict the motion of pixels between consecutive images. By analyzing spatial depth via depth maps, the tool generates new frames that account for occlusions and overlapping objects to create slow motion effects. The framework incorporates optical flow int
Detectron is a PyTorch object detection framework and computer vision research platform. It provides implementations of neural network architectures for locating and identifying objects in images, including Mask R-CNN for generating instance segmentation masks and RetinaNet for one-stage detection. The platform supports computer vision prototyping and object detection research through the deployment of pre-trained baseline models. This allows for the rapid implementation and evaluation of visual recognition systems. Its capabilities cover image object localization and instance segmentation w
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement