30 open-source projects similar to mvig-sjtu/alphapose, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best AlphaPose alternative.
OpenPose is a real-time pose estimation engine designed to detect and track human body, face, hand, and foot landmarks. It functions as a multi-person motion tracker, identifying the spatial coordinates of multiple individuals simultaneously within video streams or static images. Beyond two-dimensional detection, the software acts as a three-dimensional kinematics processor, reconstructing spatial movement data from single or multiple synchronized camera perspectives. The system distinguishes itself through a bottom-up approach that utilizes part-affinity fields to associate body parts across
MMPose is a PyTorch-based pose estimation toolbox and deep learning training pipeline designed for detecting 2D and 3D keypoints on humans, animals, and faces. It serves as a computer vision model zoo and a framework for both 2D pose estimation and 3D pose lifting. The project is distinguished by its modular architecture and extensibility, employing a registry-based system and hierarchical configurations to allow for custom algorithm integration and model pipeline customization. It supports diverse estimation paradigms, including top-down, bottom-up, and two-stage pose lifting workflows. The
This is a multi-person pose estimation framework designed for real-time human keypoint detection. It functions as a bottom-up human pose estimator that identifies skeletal joints across all people in a scene without requiring a separate person detector. The system utilizes a convolutional neural network model to generate heatmaps and vector fields for posture analysis. It specifically implements part affinity fields to encode the location and orientation of limbs, allowing the model to connect individual joints into complete skeletons. The project covers computer vision motion analysis and d
VIBE is a 3D human pose estimation framework designed to reconstruct human body shapes and poses from video frames. It functions as a toolkit for predicting parameters of the SMPL human body model to generate 3D mesh sequences. The system includes a 3D motion data exporter to convert predicted pose sequences into standard 3D file formats for use in graphics and animation software. It also provides a structured training pipeline for preparing datasets and training models to estimate body shapes from images. Its capabilities cover computer vision for estimating body pose and shape, as well as
DensePose is a 3D human pose estimation framework designed to map 2D image pixels to a 3D surface-based model of the human body in real time. It functions as a computer vision anatomical mapper that projects 2D visual data onto a 3D surface to create detailed anatomical representations. The system operates as an image-to-3D texture transfer engine, localizing 2D image annotations onto 3D models to apply photographic textures to digital human representations. It uses a surface-based body mapping method to associate human pixels in an RGB image with specific coordinates on a 3D body template.
This project is a deep learning framework built for detecting and tracking human body keypoints in images and video streams. It functions as both a real-time motion tracking system and a machine learning environment for training and evaluating pose estimation models. The system utilizes a two-branch convolutional neural network to predict body part locations and their directional connections simultaneously. It employs multi-stage feature refinement to improve keypoint localization accuracy and uses greedy parsing and bipartite matching algorithms to associate detected parts into individual sk
VideoPose3D is a machine learning framework designed for 3D human pose estimation. It functions as a motion reconstruction tool that predicts 3D joint positions from 2D video sequences using a temporal convolutional network to process body movement over time. The project includes a semi-supervised learning pipeline that improves pose accuracy by combining labeled datasets with unlabeled video data and projection consistency loss. It also features a video pose visualizer capable of rendering 3D skeleton reconstructions and 2D keypoints as overlays on original footage. The framework covers the
This is a real-time object detection framework built on the YOLOv3 architecture, implemented in PyTorch. It provides a complete pipeline for identifying and localizing objects in images and video using a single neural network pass, combining a Darknet-53 backbone with multi-scale feature pyramids and anchor-based bounding box prediction. The framework extends beyond basic detection to include instance segmentation, human pose estimation, and multi-object tracking across video frames. It offers a model export toolkit that converts trained models through ONNX to CoreML, TensorFlow Lite, and Ten
Detectron2 is a PyTorch computer vision framework and visual recognition platform designed for training and deploying models for object detection, image segmentation, and visual recognition. It provides a research-oriented environment for training complex vision models with multi-GPU acceleration. The project includes a specialized object detection library for identifying and locating multiple objects via bounding boxes, as well as an image segmentation toolkit for creating pixel-level masks through instance, semantic, and panoptic segmentation. Additionally, it features a human pose estimati
PaddleDetection is an object detection framework designed for the end-to-end development, training, and deployment of computer vision models. It provides a comprehensive library of modular neural network architectures and pipelines that support object detection, instance segmentation, and multi-object tracking tasks. The project distinguishes itself through a configuration-driven approach that decouples model components like backbones and heads, allowing for the flexible assembly of custom vision workflows. It incorporates advanced techniques such as anchor-free detection logic, joint detecti
This project is a PyTorch-based computer vision library and deep learning image processing framework. It provides a collection of neural network architectures designed for visual analysis tasks, specifically focusing on image classification, object detection, and semantic segmentation. The toolset implements diverse methodologies for visual recognition, including anchor-free object detection, regional proposal networks, and heatmap-based keypoint estimation. It utilizes both convolutional neural networks for spatial feature extraction and transformer-based self-attention mechanisms to compute
DeepLabCut is a deep learning toolkit for markerless 2D and 3D animal pose estimation. It functions as a motion tracking system that identifies anatomical keypoints on animals in video sequences without the need for physical markers. The framework utilizes transfer learning and a library of pre-trained weights to accelerate the training of networks for different species. It supports multi-individual identity tracking to maintain unique identities across video sequences and offers real-time pose detection for live video feeds. The system covers a broad range of computer vision capabilities, i
This project is a PyTorch implementation of a research architecture designed for high-resolution representation learning. It serves as a computer vision framework focused on precise keypoint detection, human pose estimation, and semantic image segmentation. The implementation provides specialized tools for identifying anatomical landmarks on the human body and predicting facial keypoint coordinates to analyze orientation and alignment. It utilizes a system of multi-resolution parallel streams and repeated multi-scale fusion to maintain high-resolution representations throughout the network.
Sapiens is a high-resolution human vision model designed for high-precision, human-centric computer vision tasks. It functions as a suite of tools for estimating human pose, depth, and surface geometry. The project utilizes a vision transformer backbone to perform multiple tasks through a shared encoder. This architecture enables the simultaneous prediction of skeletal structures, joint locations, and the distance between a camera and a human subject. The model's capabilities cover human body part segmentation to isolate anatomical regions from backgrounds and surface normal prediction to re
sam-3d-body is a machine learning framework for 3D human mesh recovery and pose estimation. It utilizes a 3D human mesh recovery model to reconstruct full-body meshes, including the body, hands, and feet, from a single image. The project implements a specialized extension of the Segment Anything Model to guide the extraction and refinement of human body shapes. This integration allows for prompt-guided mesh recovery, where 2D masks and keypoints constrain the inference of 3D pose and shape parameters. The system covers a range of computer vision capabilities, including 3D spatial alignment t
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
This project is a modular PyTorch framework for training and evaluating object detection and instance segmentation models. It serves as a computer vision research tool and a deep learning inference engine designed to identify object locations, classes, and pixel-level masks within images. The framework implements a two-stage inference pipeline that utilizes region proposal networks and a symmetric mask-head architecture. It provides specialized capabilities for instance segmentation, object bounding box detection, and human pose estimation via anatomical keypoint detection. The system includ
FreeMoCap is an open-source markerless motion capture system that reconstructs 3D human pose from video. It uses a multi-camera setup with ChArUco board calibration to accurately triangulate body landmarks, and it also supports single-camera recording for simpler captures. The system outputs skeleton joint data and generates interactive Jupyter notebooks for each recording, enabling users to explore and analyse motion data directly. Built around hardware-synchronised video capture and MediaPipe-based 2D pose detection, FreeMoCap supports both calibrated multi-camera recording and real-time 2D
This is a PyTorch deep learning framework and tool for human motion synthesis that generates 3D character animations from text prompts or action descriptions. It functions as a text-to-motion generator that converts natural language and categorical labels into temporally consistent 3D skeletal movement sequences. The system utilizes a transformer-based diffusion model to iteratively denoise motion data. It includes capabilities for action-conditioned generation, monocular-to-3D motion lifting, and motion sequence editing using text constraints. The framework incorporates geometric motion con
Relativ is an open-source project for the development of custom virtual reality hardware, encompassing the mechanical design, electronics, and software interfaces required to build a headset from scratch. It provides the frameworks necessary for assembling devices using open-source electronics and firmware. The project integrates custom hardware with SteamVR through driver-based configurations, mapping device identifiers and display viewports to ensure rendered images align with physical secondary displays. It employs a combination of microcontroller-based inertial measurement unit polling fo
This project is a collection of educational resources and implementation frameworks providing deep learning model recipes, code samples, and step-by-step guides for computer vision tasks. It organizes complex workflows into modular recipes and implementation guides to facilitate the building of image and video analysis models. The framework focuses on specialized vision capabilities, including an image similarity framework for fast retrieval and re-ranking, human pose estimation, and video action recognition. It also provides specific tools for crowd density estimation and document image clea
The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"
This project is a pretrained model library for PyTorch, providing a collection of convolutional neural network architectures and weights. It serves as a computer vision model zoo for image classification and feature extraction, offering a framework for transfer learning where pretrained networks are adapted for custom image recognition tasks. The library focuses on transforming images into high-level numerical representations and calculating class probability scores. It includes utilities for downloading and initializing standard architectures such as ResNet, Inception, and Xception. Capabil
This project is a 3D visual localization framework designed to determine a camera's exact position and orientation by matching 2D image features against a 3D reference model. It includes a structure-from-motion pipeline to reconstruct 3D scene geometry from unordered image sets, creating the necessary spatial maps for localization. The system employs a hierarchical coarse-to-fine localization approach. This process begins with a global-descriptor image retrieval system to identify candidate reference images from a large database and progresses through local feature matching to final 3D model
This project is a Python bio-imaging toolkit and analysis suite designed for processing and analyzing microscopy and medical images. It provides a collection of tools for image quantification, medical image segmentation, and general bio-imaging workflows. The suite includes specialized capabilities for quantifying biological data, such as measuring neuron branching complexity via Sholl analysis, calculating particle size distributions, and tracking wound area in scratch assays. It also features a medical image segmentation library that implements U-Net architectures for isolating anatomical s
RuView is a WiFi spatial sensing platform that uses radio frequency reflections to detect presence, track body poses, and monitor vital signs without the use of cameras. It functions as a 3D point-cloud spatial mapper, converting signal disturbances into coordinate sets to visualize physical environments and human movement. The system operates as a distributed sensing mesh where synchronized nodes use consensus and shared audit trails to maintain data consistency across a swarm. It further acts as an MQTT home automation bridge, streaming real-time spatial telemetry and occupancy data to smar
GoCV is a computer vision library and Go language binding for OpenCV. It serves as an image processing toolkit and deep learning inference engine, providing programmatic access to a wide range of algorithms for image manipulation, object detection, and video analysis. The project differentiates itself through high-performance native bindings and hardware acceleration. It utilizes a foreign function interface to map Go calls to C++ functions and includes a hardware-agnostic backend dispatch to route neural network tasks to computation engines such as CUDA and OpenVINO. The library covers a br
This is a PyTorch object detection framework that implements the Single Shot MultiBox Detector for identifying and localizing multiple objects within images and video. The project provides a neural network architecture designed for single-shot object detection, which predicts bounding boxes and class labels in one pass. The implementation includes a real-time object detector capable of processing live video streams to track and label objects across sequential frames. It also features a complete computer vision training pipeline for preparing image datasets and training model weights. The fra
YOLOv10 is a PyTorch computer vision library and real-time vision framework designed for locating and identifying multiple objects in images and video streams. It functions as an end-to-end object detector that optimizes for high-speed deployment and detection precision. The project is distinguished by an NMS-free detection architecture that predicts a single bounding box per object, eliminating the need for non-maximum suppression post-processing to reduce inference latency. It further optimizes for edge hardware through scalable weights and a quantization-friendly structure that facilitates