Rf Detr | Awesome Repository

RF-DETR is a Python library for training and deploying object detection, instance segmentation, and keypoint detection models built on a vision transformer architecture. It provides a unified command-line interface and Python API for the full workflow, from fine-tuning pretrained checkpoints on custom datasets to running inference on images, video files, and live camera streams.

The project supports training on datasets in COCO or YOLO format, with automatic format detection and configurable augmentation pipelines. Models can be exported to ONNX, TFLite, or TensorRT for deployment across edge hardware, mobile devices, and serverless APIs. Training includes built-in experiment tracking with TensorBoard, Weights and Biases, MLflow, and ClearML, along with multi-GPU support, early stopping, and automatic checkpoint selection based on validation mAP.

Inference capabilities cover batch processing, real-time detection from webcams or RTSP streams, and per-instance segmentation masks. The library also provides tools for converting between dataset formats and caching model weights locally for faster repeated predictions.

Features

Instance Segmentation Engines - Produces per-instance segmentation masks for every detected object using a single unified model API.
Object Detection - Runs a pretrained vision transformer model to locate and classify objects with bounding boxes.
Real-Time Object Detection - Runs transformer-based detection models on images and returns bounding boxes and class labels with low latency.
Image Segmentation - Runs pretrained segmentation models on static images and returns detected objects with masks and labels.

Features

Instance Segmentation Engines - Produces per-instance segmentation masks for every detected object using a single unified model API.
Object Detection - Runs a pretrained vision transformer model to locate and classify objects with bounding boxes.
Real-Time Object Detection - Runs transformer-based detection models on images and returns bounding boxes and class labels with low latency.
Image Segmentation - Runs pretrained segmentation models on static images and returns detected objects with masks and labels.