SAM 3D Objects is a promptable foundation model that recovers 3D objects and human meshes from single images. It converts masked objects in a single photograph into full 3D models with pose, shape, texture, and layout, while also producing complete 3D human body meshes from the same input.
The system integrates promptable segmentation to isolate objects and humans before reconstruction, then aligns the independently reconstructed 3D elements into a shared coordinate space. This enables scene-level understanding where multiple 3D reconstructions from the same image coexist in a common coordinate frame.
The pipeline is end-to-end differentiable, combining segmentation, reconstruction, alignment, and texture recovery for joint optimization from monocular input. It uses learned parametric models and geometric reasoning to estimate 3D shape, pose, and texture from a single view.