Open-source libraries and frameworks for detecting and tracking human body keypoints in video or images.
OpenPose is a real-time pose estimation engine designed to detect and track human body, face, hand, and foot landmarks. It functions as a multi-person motion tracker, identifying the spatial coordinates of multiple individuals simultaneously within video streams or static images. Beyond two-dimensional detection, the software acts as a three-dimensional kinematics processor, reconstructing spatial movement data from single or multiple synchronized camera perspectives. The system distinguishes itself through a bottom-up approach that utilizes part-affinity fields to associate body parts across
This is a comprehensive pose estimation engine that supports real-time multi-person tracking, 3D keypoint reconstruction, and provides the necessary Python wrappers to integrate these capabilities into your projects.
DensePose is a 3D human pose estimation framework designed to map 2D image pixels to a 3D surface-based model of the human body in real time. It functions as a computer vision anatomical mapper that projects 2D visual data onto a 3D surface to create detailed anatomical representations. The system operates as an image-to-3D texture transfer engine, localizing 2D image annotations onto 3D models to apply photographic textures to digital human representations. It uses a surface-based body mapping method to associate human pixels in an RGB image with specific coordinates on a 3D body template.
DensePose is a comprehensive framework for 3D human pose estimation that maps 2D pixels to surface-based body models, providing the real-time inference, multi-person tracking, and Python-based tools required for advanced anatomical mapping.
VideoPose3D is a machine learning framework designed for 3D human pose estimation. It functions as a motion reconstruction tool that predicts 3D joint positions from 2D video sequences using a temporal convolutional network to process body movement over time. The project includes a semi-supervised learning pipeline that improves pose accuracy by combining labeled datasets with unlabeled video data and projection consistency loss. It also features a video pose visualizer capable of rendering 3D skeleton reconstructions and 2D keypoints as overlays on original footage. The framework covers the
VideoPose3D is a specialized framework for 3D human pose estimation that provides the necessary Python tools for temporal motion reconstruction, though it focuses on post-processing 2D detections rather than providing a full real-time multi-person tracking pipeline.
MMPose is a PyTorch-based pose estimation toolbox and deep learning training pipeline designed for detecting 2D and 3D keypoints on humans, animals, and faces. It serves as a computer vision model zoo and a framework for both 2D pose estimation and 3D pose lifting. The project is distinguished by its modular architecture and extensibility, employing a registry-based system and hierarchical configurations to allow for custom algorithm integration and model pipeline customization. It supports diverse estimation paradigms, including top-down, bottom-up, and two-stage pose lifting workflows. The
MMPose is a comprehensive PyTorch-based framework that provides pre-trained models, support for both 2D and 3D human keypoint detection, and the necessary tools for multi-person tracking and real-time deployment.
VIBE is a 3D human pose estimation framework designed to reconstruct human body shapes and poses from video frames. It functions as a toolkit for predicting parameters of the SMPL human body model to generate 3D mesh sequences. The system includes a 3D motion data exporter to convert predicted pose sequences into standard 3D file formats for use in graphics and animation software. It also provides a structured training pipeline for preparing datasets and training models to estimate body shapes from images. Its capabilities cover computer vision for estimating body pose and shape, as well as
This framework provides 3D human pose and mesh reconstruction from video, offering the requested Python API, multi-person tracking, and pre-trained models for 3D keypoint estimation.
FreeMoCap is an open-source markerless motion capture system that reconstructs 3D human pose from video. It uses a multi-camera setup with ChArUco board calibration to accurately triangulate body landmarks, and it also supports single-camera recording for simpler captures. The system outputs skeleton joint data and generates interactive Jupyter notebooks for each recording, enabling users to explore and analyse motion data directly. Built around hardware-synchronised video capture and MediaPipe-based 2D pose detection, FreeMoCap supports both calibrated multi-camera recording and real-time 2D
FreeMoCap is a comprehensive motion capture system that uses MediaPipe for 2D keypoint detection and multi-view geometry to reconstruct 3D human poses, providing a complete Python-based pipeline for tracking and analysis.
sam-3d-body is a machine learning framework for 3D human mesh recovery and pose estimation. It utilizes a 3D human mesh recovery model to reconstruct full-body meshes, including the body, hands, and feet, from a single image. The project implements a specialized extension of the Segment Anything Model to guide the extraction and refinement of human body shapes. This integration allows for prompt-guided mesh recovery, where 2D masks and keypoints constrain the inference of 3D pose and shape parameters. The system covers a range of computer vision capabilities, including 3D spatial alignment t
This repository provides a specialized framework for 3D human mesh recovery and pose estimation using prompt-guided models, offering the core functionality required for 3D keypoint detection and Python-based integration.
YOLOv7 is a PyTorch vision library and real-time inference engine designed for object detection, human pose estimation, and instance segmentation. It provides a framework for detecting and locating multiple objects within images or video streams using neural networks. The system includes tools for custom model training and fine-tuning, allowing pre-trained weights to be adapted to specialized datasets via transfer learning. It also supports model weight export and format conversion to facilitate deployment on production servers and embedded edge devices.
This library provides a robust framework for real-time human pose estimation and multi-person tracking with pre-trained models and a Python API, though it is primarily an object detection engine that includes pose estimation as a specialized capability.
EasyMocap is a markerless 3D human motion capture system that recovers body, hand, and face poses from single or multi-view video without physical markers or suits. It uses parametric body models like SMPL, SMPL-X, and MANO, and leverages mirror reflections to resolve depth ambiguity in single-view pose estimation, improving accuracy by computing mirror surface normals from vanishing points. The system distinguishes itself through mirror-assisted depth disambiguation, enabling accurate 3D pose reconstruction from a single RGB image or video that includes a mirror reflection. It also supports
This is a specialized markerless motion capture system that performs 3D human pose estimation and tracking using parametric models, providing the Python-based tools and pre-trained capabilities needed for complex multi-view or mirror-assisted reconstruction.
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
This is a general-purpose machine learning and computer vision toolkit that includes robust, pre-trained models for human pose estimation and facial landmark detection, though it is a broader framework rather than a specialized pose-only library.
Detectron2 is a PyTorch computer vision framework and visual recognition platform designed for training and deploying models for object detection, image segmentation, and visual recognition. It provides a research-oriented environment for training complex vision models with multi-GPU acceleration. The project includes a specialized object detection library for identifying and locating multiple objects via bounding boxes, as well as an image segmentation toolkit for creating pixel-level masks through instance, semantic, and panoptic segmentation. Additionally, it features a human pose estimati
Detectron2 is a comprehensive computer vision framework that includes robust human pose estimation capabilities, providing the necessary Python API, pre-trained models, and multi-person tracking support for your requirements.
PaddleDetection is an object detection framework designed for the end-to-end development, training, and deployment of computer vision models. It provides a comprehensive library of modular neural network architectures and pipelines that support object detection, instance segmentation, and multi-object tracking tasks. The project distinguishes itself through a configuration-driven approach that decouples model components like backbones and heads, allowing for the flexible assembly of custom vision workflows. It incorporates advanced techniques such as anchor-free detection logic, joint detecti
This framework provides a comprehensive suite of tools for computer vision, including specific modules and pre-trained models for human pose estimation and multi-person tracking that meet your requirements for real-time inference and Python-based development.
Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification. By utilizing a modular architecture, the platform allows users to swap model components to balance inference speed and accuracy requirements for diverse applications. The framework distinguishes itself through its support for real-time processing and flexible deployment. It in
This framework provides a robust, real-time pose estimation engine with multi-person tracking and pre-trained models accessible via a Python API, though it is primarily optimized for 2D keypoint detection rather than native 3D estimation.
GoCV is a computer vision library and Go language binding for OpenCV. It serves as an image processing toolkit and deep learning inference engine, providing programmatic access to a wide range of algorithms for image manipulation, object detection, and video analysis. The project differentiates itself through high-performance native bindings and hardware acceleration. It utilizes a foreign function interface to map Go calls to C++ functions and includes a hardware-agnostic backend dispatch to route neural network tasks to computation engines such as CUDA and OpenVINO. The library covers a br
This library provides Go bindings for OpenCV, offering the necessary tools and pre-trained model support to implement human pose estimation and multi-person tracking, though it requires custom development in Go rather than providing a ready-to-use Python API.
This project is a modular PyTorch framework for training and evaluating object detection and instance segmentation models. It serves as a computer vision research tool and a deep learning inference engine designed to identify object locations, classes, and pixel-level masks within images. The framework implements a two-stage inference pipeline that utilizes region proposal networks and a symmetric mask-head architecture. It provides specialized capabilities for instance segmentation, object bounding box detection, and human pose estimation via anatomical keypoint detection. The system includ
This framework provides a robust PyTorch-based environment for human pose estimation and keypoint detection, offering the necessary Python API and inference capabilities to handle complex computer vision tasks.
This project is a collection of educational resources and implementation frameworks providing deep learning model recipes, code samples, and step-by-step guides for computer vision tasks. It organizes complex workflows into modular recipes and implementation guides to facilitate the building of image and video analysis models. The framework focuses on specialized vision capabilities, including an image similarity framework for fast retrieval and re-ranking, human pose estimation, and video action recognition. It also provides specific tools for crowd density estimation and document image clea
This repository provides modular PyTorch-based recipes and implementation guides for human pose estimation, offering the necessary Python workflows and pre-trained model integration to build custom tracking solutions.