This project is an object detection framework implementing the YOLOv3 architecture using Keras and TensorFlow. It functions as a deep learning vision model and computer vision toolset designed to locate and classify multiple entities within images and video streams using bounding boxes.
The system includes a multi-GPU inference engine to distribute computational loads across several graphics processing units. It also provides a pipeline for creating custom object detectors by retraining pre-trained weights on annotated datasets to recognize user-defined object classes.
The framework covers model training and fine-tuning through a two-stage retraining process and weight optimization. It includes utilities for network architecture configuration via external files, weight format conversion between frameworks, and the transformation of VOC annotations into plain text for training.
The project supports inference across static images, live streams, and sequential video files.