RF-DETR is a Python library for training and deploying object detection, instance segmentation, and keypoint detection models built on a vision transformer architecture. It provides a unified command-line interface and Python API for the full workflow, from fine-tuning pretrained checkpoints on custom datasets to running inference on images, video files, and live camera streams.
The project supports training on datasets in COCO or YOLO format, with automatic format detection and configurable augmentation pipelines. Models can be exported to ONNX, TFLite, or TensorRT for deployment across edge hardware, mobile devices, and serverless APIs. Training includes built-in experiment tracking with TensorBoard, Weights and Biases, MLflow, and ClearML, along with multi-GPU support, early stopping, and automatic checkpoint selection based on validation mAP.
Inference capabilities cover batch processing, real-time detection from webcams or RTSP streams, and per-instance segmentation masks. The library also provides tools for converting between dataset formats and caching model weights locally for faster repeated predictions.