30 open-source projects similar to nwojke/deep_sort, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Deep Sort alternative.
This project is a multi-object tracking library and computer vision toolkit designed to maintain consistent identity IDs for objects across video frames. It provides a motion-based object tracking system that converts raw detections into stable temporal tracks, enabling the analysis of object movement and behavior over time. The toolkit distinguishes itself through advanced identity maintenance, utilizing Kalman filters for linear motion tracking and sparse optical flow for camera motion estimation. It features multi-stage object association to recover occluded objects and non-linear motion t
ByteTrack is a multi-object tracking framework that implements the ByteTrack algorithm, an ECCV 2022 method designed to recover occluded objects and reduce trajectory fragmentation. The core innovation of the project is its association algorithm, which processes every detection box—including low-confidence ones—by using separate high and low score thresholds, Kalman filter motion prediction, and Hungarian algorithm matching to produce consistent object identities across video frames. The project distinguishes itself by its comprehensive approach to handling occlusions and fragmented trajector
GoCV is a computer vision library and Go language binding for OpenCV. It serves as an image processing toolkit and deep learning inference engine, providing programmatic access to a wide range of algorithms for image manipulation, object detection, and video analysis. The project differentiates itself through high-performance native bindings and hardware acceleration. It utilizes a foreign function interface to map Go calls to C++ functions and includes a hardware-agnostic backend dispatch to route neural network tasks to computation engines such as CUDA and OpenVINO. The library covers a br
This project is a modular research toolkit designed for developing, training, and evaluating deep learning models for object detection, segmentation, and video instance tracking. It provides a flexible training engine that manages complex neural network execution, including distributed training, custom lifecycle hooks, and weight optimization. The framework is built around a hierarchical configuration system that allows users to define architectures, data pipelines, and training hyperparameters through composable, inheritable files. The project distinguishes itself through its highly modular
PaddleDetection is an object detection framework designed for the end-to-end development, training, and deployment of computer vision models. It provides a comprehensive library of modular neural network architectures and pipelines that support object detection, instance segmentation, and multi-object tracking tasks. The project distinguishes itself through a configuration-driven approach that decouples model components like backbones and heads, allowing for the flexible assembly of custom vision workflows. It incorporates advanced techniques such as anchor-free detection logic, joint detecti
Boxmot is a multi-object tracking framework designed to follow multiple objects across video frames using motion and appearance algorithms to maintain consistent identities. It functions as a system for tracking objects with specific orientations using rotated bounding boxes and corresponding intersection-over-union computations. The project includes a re-identification model optimizer that converts neural networks into formats for hardware-accelerated execution. It also features an evolutionary hyperparameter tuner that iteratively mutates tracker settings to maximize accuracy for specific d
This project is a foundation model and research toolkit designed for promptable object segmentation and temporal tracking. It provides a unified framework for isolating specific regions or objects within both static images and dynamic video sequences. The system distinguishes itself through a streaming memory architecture that maintains temporal consistency by storing and retrieving object features across frames. This mechanism allows the model to resolve occlusions and preserve object identity even when targets move out of view or change appearance. By utilizing a shared backbone for both im
This project is a computer vision system for object segmentation and tracking across images and videos. It employs models capable of identifying and masking objects using text prompts, bounding boxes, click points, or image exemplars. The system differentiates itself through memory-based video tracking and shared-memory architectures that maintain consistent object identities over time. It supports multi-object processing in single computation passes to increase frame throughput and utilizes iterative refinement to correct segmentation boundaries through sequential prompts. The software also
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
ImageAI is a Python computer vision library providing a suite of tools for image classification, object detection, and video analytics. It functions as an integrated framework for locating and labeling objects in static images and video streams, utilizing deep learning models for identification and categorization. The project includes a model training toolkit that allows for the creation of custom classifiers and detectors through scratch training or transfer learning. It features a GPU-accelerated inference engine to increase processing speed for vision tasks and includes specialized utiliti
Co-tracker is a PyTorch point tracking framework and dense point tracking model designed to map the motion of individual pixels throughout a video. It functions as a video pixel tracker that predicts point trajectories and visibility masks across sequences of video frames. The project includes a computer vision training pipeline that utilizes teacher-student knowledge distillation. This allows for the generation of pseudo-labels from unannotated real video data to fine-tune pre-trained models and reduce the gap between synthetic and real data environments. The framework provides capabilities
Track-Anything is an AI-driven video object segmentation and tracking system. It utilizes the Segment Anything Model to isolate and mask multiple objects across video frames, providing tools for automated mask propagation and background-filling inpainting. The system distinguishes itself through a multi-object segmentation pipeline that can follow several distinct targets simultaneously. It includes a video inpainting utility to remove tracked objects and replace them with synthesized background content, as well as temporal mask refinement to correct tracking drift. The project covers broad
This project is a PyTorch person re-identification framework designed for training and evaluating models that identify individuals across different camera views. It provides a complete model training pipeline, a deep learning feature extractor for converting images into numeric vectors, and a suite of computer vision benchmarking tools to measure identity retrieval accuracy. The framework includes a specialized transfer learning toolkit that supports layer freezing, staged learning rate optimization, and differential learning rates for fine-tuning pretrained models. It distinguishes itself th
opencv4nodejs is a set of JavaScript wrappers and a C++ native addon that provides Node.js bindings for the OpenCV library. It functions as a computer vision library and image processing framework, exposing high-performance C++ algorithms to a JavaScript environment. The project enables the execution of vision algorithms for detecting faces, tracking objects, and analyzing visual data using deep neural networks. It includes capabilities for data pattern classification, text pattern recognition, and the identification of facial landmarks and gestures. The framework covers a broad capability s
Roboflow Sports is a sports video analysis system that combines object detection and tracking with bird's-eye field visualization. Its core pipeline detects and tracks players, referees, and balls across video frames, then maps those tracked positions onto a radar-style overhead view of the playing field. The system goes beyond basic detection by localizing field boundaries and key landmarks such as pitch lines and corners, enabling spatial mapping of player positions relative to the field geometry. It classifies detected players by team affiliation through visual feature extraction and clust
Gluon-CV is an MXNet computer vision library that provides a comprehensive collection of pre-implemented vision architectures and training pipelines. It serves as a deep learning research toolkit and a model zoo containing state-of-the-art pre-trained weights for image and video analysis. The project includes a specialized human pose estimation library and a model compression toolkit. These tools allow for the pruning and quantization of deep learning models to increase inference speed and facilitate deployment on constrained edge hardware. The library covers a broad range of vision capabili
This is an open-source autonomous driving perception pipeline that processes camera and lidar sensor data to detect, track, and fuse objects in real-world driving environments. The project integrates an end-to-end perception workflow combining sensor calibration, deep learning object detection, Kalman filter tracking, and sensor fusion for robust scene understanding. The pipeline includes camera calibration tools to remove lens distortion from raw images, deep learning model training for object classification and detection, and multi-object tracking using Kalman filters with data association
This project is a multi-modal image segmentation framework and a text-to-mask vision model. It serves as a SAM-based visual segmenter designed to isolate distinct objects within images and video by converting natural language prompts and other inputs into pixel-level semantic masks. The system functions as a multi-modal image segmentation framework that integrates text, image, and audio signals to generate masks. It includes an interactive video object tracker that isolates and tracks visual entities across video frames using referring images or textual queries. The framework provides capabi
Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification. By utilizing a modular architecture, the platform allows users to swap model components to balance inference speed and accuracy requirements for diverse applications. The framework distinguishes itself through its support for real-time processing and flexible deployment. It in
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
OpenCVSharp is a .NET library that wraps native OpenCV functions, providing C# developers with access to OpenCV's computer vision capabilities through an API that mirrors the native C/C++ style. It serves as a managed wrapper for image processing, feature detection, object detection, and image manipulation tasks, while also handling automatic disposal of unmanaged OpenCV resources like Mat objects to prevent memory leaks in .NET applications. The library enables keypoint detection and descriptor extraction using algorithms such as AKAZE, BRISK, or FAST, with brute-force or FLANN-based matchin
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Hide screen when boss is approaching.
AnimateAnyone is an appearance-preserving video synthesizer designed for character animation from a single static image. It functions as a diffusion image-to-video generator that transforms a source image into a high-fidelity video sequence while maintaining consistent character identity, clothing, and visual details across all frames. The system enables video-driven character reenactment by transferring motions, facial expressions, and body movements from a reference video onto a static character. It employs pose-guided video generation to control movement via skeleton keypoints and pose sig
MiDaS is a PyTorch computer vision library and monocular depth estimation model designed to predict scene depth from single images. It functions as a scene depth predictor that computes distance maps to determine object proximity to the camera. The project enables zero-shot depth transfer, allowing the model to be applied to new datasets or environments without additional training data. It focuses on relative depth regression to predict scale-invariant depth maps. The library includes a real-time depth visualizer for capturing live camera feeds and displaying corresponding depth maps. It als
ccv is a computer vision library written in C designed for high-performance visual analysis. It serves as a framework for image classification, object detection, and the identification of faces, pedestrians, and vehicles. The library distinguishes itself through hardware-accelerated vision and deep learning inference optimizations. It utilizes a quantized tensor processor to transform floating-point data into eight-bit integers and implements integer-quantized attention mechanisms to reduce memory bandwidth and increase data throughput. The project covers a broad range of capabilities, inclu
This project is a collection of educational resources and implementation frameworks providing deep learning model recipes, code samples, and step-by-step guides for computer vision tasks. It organizes complex workflows into modular recipes and implementation guides to facilitate the building of image and video analysis models. The framework focuses on specialized vision capabilities, including an image similarity framework for fast retrieval and re-ranking, human pose estimation, and video action recognition. It also provides specific tools for crowd density estimation and document image clea
The Adversarial Robustness Toolbox (ART) is an open-source library that provides a unified framework for evaluating, defending, and certifying machine learning models against adversarial threats. It wraps models from any framework behind a common estimator interface, enabling composable pipelines for attack generation, defense application, robustness certification, and privacy auditing across evasion, poisoning, and extraction threats. The library distinguishes itself by covering the full adversarial ML security lifecycle within a single toolkit. It supports gradient-based adversarial example
This is a real-time object detection framework built on the YOLOv3 architecture, implemented in PyTorch. It provides a complete pipeline for identifying and localizing objects in images and video using a single neural network pass, combining a Darknet-53 backbone with multi-scale feature pyramids and anchor-based bounding box prediction. The framework extends beyond basic detection to include instance segmentation, human pose estimation, and multi-object tracking across video frames. It offers a model export toolkit that converts trained models through ONNX to CoreML, TensorFlow Lite, and Ten