MMDetection3D is an open-source toolbox for 3D perception, providing a unified framework for detecting and segmenting objects in three-dimensional environments. It supports a range of core tasks including monocular 3D object detection from single camera images, LiDAR-based 3D object detection from raw point clouds, and multi-modal fusion that combines camera images with LiDAR data. The toolbox also covers point cloud semantic segmentation, assigning class labels to every point in a scan for scene understanding.
The project distinguishes itself through a config-driven pipeline that orchestrates the entire training, evaluation, and inference workflow, with support for distributed training across multiple GPUs and machines. It includes a registry-based module composition system for assembling custom models from encoder, backbone, neck, head, and loss components, and provides built-in support for sparse convolution acceleration using libraries like spconv and MinkowskiEngine. The toolbox also offers a unified dataset format conversion system that transforms raw sensor data from benchmarks such as KITTI, Waymo, and nuScenes into a standardized internal structure, along with checkpoint-based training resumption and mixed precision training for fault-tolerant and efficient workflows.
Beyond its core detection and segmentation capabilities, the project provides a comprehensive set of tools for data preparation, augmentation, and evaluation. It includes data structuring for LiDAR, multi-modal, and vision-based detection tasks, point cloud augmentation techniques, and dataset-specific evaluation protocols with metrics like mean Average Precision. The toolbox also supports model deployment, leaderboard submission for autonomous driving benchmarks, and integration with over 500 pre-trained 2D detection models from a shared codebase. Installation is available via pip or the MIM tool, and the project can be run in Docker containers or on Windows for cross-platform compatibility.