EasyMocap | Awesome Repository

EasyMocap is a markerless 3D human motion capture system that recovers body, hand, and face poses from single or multi-view video without physical markers or suits. It uses parametric body models like SMPL, SMPL-X, and MANO, and leverages mirror reflections to resolve depth ambiguity in single-view pose estimation, improving accuracy by computing mirror surface normals from vanishing points.

The system distinguishes itself through mirror-assisted depth disambiguation, enabling accurate 3D pose reconstruction from a single RGB image or video that includes a mirror reflection. It also supports multi-view triangulation and bundle adjustment calibration for synchronized camera setups, and can fit parametric models to 2D keypoints and silhouettes for robust 3D pose recovery. Reconstructed motion data can be exported to standard animation formats such as BVH and ASF/AMC.

Additional capabilities include CNN-based pose initialization, deformable mesh tracking, and a real-time visualization pipeline for immediate feedback during capture. The project also provides a manual annotation tool for labeling bounding boxes, keypoints, and segmentation masks to create ground-truth data.

Features

Markerless Motion Capture - Recovering 3D body, hand, and face poses from single or multi-view video without physical markers or suits.
Multi-View Body-Hand-Face Captures - Recovers body, hand, and face poses from video using parametric models and deformable mesh tracking without physical markers.
3D Pose Estimation - Uses a convolutional neural network to produce an initial 3D pose estimate from a single RGB image.

Features

Markerless Motion Capture - Recovering 3D body, hand, and face poses from single or multi-view video without physical markers or suits.
Multi-View Body-Hand-Face Captures - Recovers body, hand, and face poses from video using parametric models and deformable mesh tracking without physical markers.
3D Pose Estimation - Uses a convolutional neural network to produce an initial 3D pose estimate from a single RGB image.