19 个仓库
Algorithms for maintaining persistent identity and spatial coordinates of objects across video frames.
Distinguishing note: Focuses on state-based tracking for behavioral analysis.
Explore 19 awesome GitHub repositories matching artificial intelligence & ml · Object Tracking. Refine with filters or upvote what's useful.
Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification. By utilizing a modular architecture, the platform allows users to swap model components to balance inference speed and accuracy requirements for diverse applications. The framework distinguishes itself through its support for real-time processing and flexible deployment. It in
Adjusts confidence thresholds and matching logic through configuration files to define specific tracking behaviors.
Supervision is a computer vision toolset for normalizing model outputs, managing datasets, and visualizing annotations. It provides a framework to convert predictions from various classification and detection models into a standardized data format to ensure interoperability across different computer vision pipelines. The library features a post-processor for filtering, counting, and tracking detected objects across image frames and video streams. It includes capabilities for large image tiling to improve the detection of small objects and tools for assigning persistent identities to objects t
Assigns persistent identifiers to detected objects across video frames to maintain identity over time.
Frigate is a self-hosted network video recorder that functions as a private, local AI-powered vision engine. It manages video streams by performing real-time object detection, tracking, and classification directly on local hardware, ensuring that security monitoring and activity recording remain independent of cloud services. The system distinguishes itself through a modular, hardware-accelerated video pipeline that offloads intensive decoding and machine learning inference to dedicated GPUs, NPUs, or specialized accelerators like Coral TPUs and Hailo modules. It utilizes state-based object t
Maintains persistent identity and spatial coordinates for detected objects across consecutive frames to enable behavioral analysis and loitering detection.
PaddleDetection is an object detection framework designed for the end-to-end development, training, and deployment of computer vision models. It provides a comprehensive library of modular neural network architectures and pipelines that support object detection, instance segmentation, and multi-object tracking tasks. The project distinguishes itself through a configuration-driven approach that decouples model components like backbones and heads, allowing for the flexible assembly of custom vision workflows. It incorporates advanced techniques such as anchor-free detection logic, joint detecti
Adjusts model settings to recognize and track custom object classes by updating class counts and label mappings.
This is a real-time object detection framework built on the YOLOv3 architecture, implemented in PyTorch. It provides a complete pipeline for identifying and localizing objects in images and video using a single neural network pass, combining a Darknet-53 backbone with multi-scale feature pyramids and anchor-based bounding box prediction. The framework extends beyond basic detection to include instance segmentation, human pose estimation, and multi-object tracking across video frames. It offers a model export toolkit that converts trained models through ONNX to CoreML, TensorFlow Lite, and Ten
Assigns a persistent ID to each detected object and follows its movement through a video sequence.
tracking.js is a browser computer vision library written in JavaScript for performing real-time image analysis and object tracking directly within a web browser. It functions as a real-time object tracker, a color tracking tool, and a face detection utility. The library enables the detection and monitoring of specific color ranges, human faces, and known visual patterns across consecutive video frames. It extracts visual features and descriptors from images to identify distinct landmarks for matching and tracking. The project covers broad computer vision capabilities, including the ability t
Maintains persistent identity and spatial coordinates of objects across consecutive video frames.
Boxmot is a multi-object tracking framework designed to follow multiple objects across video frames using motion and appearance algorithms to maintain consistent identities. It functions as a system for tracking objects with specific orientations using rotated bounding boxes and corresponding intersection-over-union computations. The project includes a re-identification model optimizer that converts neural networks into formats for hardware-accelerated execution. It also features an evolutionary hyperparameter tuner that iteratively mutates tracker settings to maximize accuracy for specific d
Implements tracking for objects with specific orientations using rotated bounding boxes to improve accuracy for angled items.
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
Analyzes the movement and flow statistics of identified objects across sequences of video frames.
ccv is a computer vision library written in C designed for high-performance visual analysis. It serves as a framework for image classification, object detection, and the identification of faces, pedestrians, and vehicles. The library distinguishes itself through hardware-accelerated vision and deep learning inference optimizations. It utilizes a quantized tensor processor to transform floating-point data into eight-bit integers and implements integer-quantized attention mechanisms to reduce memory bandwidth and increase data throughput. The project covers a broad range of capabilities, inclu
Maintains the identity and position of specific objects across sequential video frames over long-term periods.
clmtrackr is a JavaScript computer vision library designed for facial landmark detection and real-time tracking. It implements Constrained Local Models to identify specific coordinate points on a human face within video feeds or static images. The project functions as a real-time face warping engine and expression analysis tool. It can distort facial images via parametric models to create caricatures or identify and label emotional states such as happiness, sadness, anger, and surprise based on feature coordinates. The library covers a broad range of capabilities including automatic and manu
Configures response calculation methods using grayscale, gradients, or binary patterns to balance processing speed and accuracy.
ByteTrack is a multi-object tracking framework that implements the ByteTrack algorithm, an ECCV 2022 method designed to recover occluded objects and reduce trajectory fragmentation. The core innovation of the project is its association algorithm, which processes every detection box—including low-confidence ones—by using separate high and low score thresholds, Kalman filter motion prediction, and Hungarian algorithm matching to produce consistent object identities across video frames. The project distinguishes itself by its comprehensive approach to handling occlusions and fragmented trajector
Employs Kalman filter linear motion models to predict object positions between video frames.
This is an open-source autonomous driving perception pipeline that processes camera and lidar sensor data to detect, track, and fuse objects in real-world driving environments. The project integrates an end-to-end perception workflow combining sensor calibration, deep learning object detection, Kalman filter tracking, and sensor fusion for robust scene understanding. The pipeline includes camera calibration tools to remove lens distortion from raw images, deep learning model training for object classification and detection, and multi-object tracking using Kalman filters with data association
Maintains and updates tracks for multiple objects using Kalman filters and data association techniques.
DeepSORT 是一个实时多目标跟踪框架,旨在在视频帧中保持多个对象的一致身份。它集成了深度学习外观特征与运动描述符,以通过视频数据序列跟踪对象。 该系统使用深度卷积神经网络为人员重识别生成高维视觉描述符。这些外观特征与通过卡尔曼滤波进行的运动估计相结合,并使用匈牙利算法求解,以最优地将检测结果与现有轨迹关联。 该框架包括用于基于门控的关联过滤和基于状态的轨迹管理以处理对象生命周期的能力。它还提供了用于将跟踪结果渲染到视频帧上以及根据既定基准评估跟踪性能的工具。
Employs Kalman filters to predict future object positions based on velocity and bounding box coordinates.
Navigation2 是一个用于自主移动机器人的 ROS 2 导航框架。它提供了路径规划器、代价地图管理系统、运动学控制器和行为树编排器的核心功能,用于计算无碰撞路径并执行移动指令。 该框架的特色在于使用行为树来协调模块化任务服务器,从而实现复杂的导航流程和自主恢复动作。它支持插件式架构,允许在运行时切换规划器和控制器,以适应不同环境。 该系统涵盖了广泛的功能,包括全局和局部路径规划、2D SLAM 和基于地图的定位,以及通过网格代价地图进行的环境建模。它管理各种传动系统的运动控制,并集成了用于碰撞监控和紧急预防的安全系统。其他编排功能包括多目的地路径点排序、动态目标跟踪和自动对接程序。 该框架利用生命周期管理组件来协调操作服务器的启动、关闭和健康监控。
Tracks a moving target via detection topics to maintain a specified distance.
该项目是一个多目标跟踪框架,旨在为连续视频帧中检测到的边界框分配持久身份。它作为一种计算机视觉跟踪算法,通过将检测结果与一致的标签关联,实时监控多个移动目标。 该系统利用以卡尔曼滤波为核心的状态估计方法来预测未来的物体位置,并在检测间隙期间维持身份。它采用匈牙利算法进行最优数据关联,并计算交并比 (IoU) 以匹配预测的轨迹位置与实际检测结果。 处理流水线使用线性恒速模型管理活动轨迹注册表,以简化状态转换。它执行逐帧递归处理,随着新图像的分析更新所有被跟踪物体的状态。
Employs a Kalman filter to predict future object positions and maintain tracking continuity.
FairMOT 是一个多目标跟踪框架和深度学习模型,旨在识别并跟踪视频帧中的多个实体。它实现了一个统一的流水线,将目标检测和身份重识别(Re-ID)集成到一个单阶段联合网络中。 该系统利用无锚点(anchor-free)检测方法来预测目标中心和边界框尺寸。它通过生成用于重识别的高维嵌入向量并采用卡尔曼滤波进行运动状态预测,来保持连续帧之间的身份一致性。 该框架涵盖了广泛的计算机视觉功能,包括实时目标检测和使用匈牙利算法进行轨迹分配。它还包含用于在自定义图像数据集上训练模型,以及生成带有覆盖边界框和持久标识符的视频可视化内容的工具。
Uses a Kalman filter to model motion state and predict future object locations during occlusions.
This project is a multi-object tracking library and computer vision toolkit designed to maintain consistent identity IDs for objects across video frames. It provides a motion-based object tracking system that converts raw detections into stable temporal tracks, enabling the analysis of object movement and behavior over time. The toolkit distinguishes itself through advanced identity maintenance, utilizing Kalman filters for linear motion tracking and sparse optical flow for camera motion estimation. It features multi-stage object association to recover occluded objects and non-linear motion t
Maintains consistent identity IDs for objects across video frames using advanced motion-based tracking algorithms.
该项目是一个计算机视觉系统,专为使用实时摄像头馈送进行人脸识别和身份追踪而设计。它提供了一个框架,通过将实时视频输入与预注册人脸描述符的本地数据库进行比较,同时捕获、注册和识别多个人员。 该系统通过一个性能导向的处理管道脱颖而出,该管道在实时分析过程中平衡计算负载。通过将深度神经网络特征提取与基于质心的对象追踪相结合,该软件在视频帧间保持一致的身份标签,同时最大限度地减少昂贵识别计算的频率。这种方法允许在不需要对每一帧进行完整处理的情况下,实现对多人的稳定追踪和识别。 该库支持一系列身份管理任务,包括创建可搜索的人脸数据库和人员的自动记录。它处理生物识别数据的整个生命周期,从最初从摄像头图像中提取唯一的数值向量,到将这些描述符持久化存储在本地文件系统中以供将来验证。
Maintains persistent identity and spatial coordinates of faces across video frames using centroid movement.
该项目是一个计算机视觉管道,集成了对象检测和跟踪功能,以监控视频流中的移动对象。它作为一个端到端的分析工具,处理视频帧以识别、分类并在对象穿过场景时保持其唯一身份。 该系统结合了用于检测的深度学习推理和用于确保时间连续性的运动估计。通过将视觉外观描述符与预测性运动建模配对,即使在暂时遮挡或空间重叠不足的情况下,它也能保持对象身份。该框架采用顺序处理将检测结果与跟踪逻辑同步,从而允许对运动模式进行持续监控。 除了基本的跟踪,该软件还包括量化视频源内活动的能力。它支持计算对象或车辆穿过指定线或进入特定区域时的总数。该实现被构建为一个开发框架,用于构建解释和提取动态环境数据的自定义视觉应用程序。
Predicts future object positions using Kalman filters to maintain tracking during temporary occlusions.