This project is an optical character recognition tool designed to extract hardcoded subtitles from video frames and convert them into synchronized subtitle files. It functions as a text processor that transforms embedded visual text into a written format to improve video accessibility and translation.
The system uses graphics processing units to increase the speed and accuracy of text recognition. It includes a subtitle cleaning tool that applies custom mapping configurations to filter out watermarks, channel logos, and duplicate lines from the extracted text.
The tool supports batch processing for multiple video files that share identical resolutions and text region settings. It utilizes region-based extraction to isolate subtitles from background noise and synchronizes recognized text strings with specific video timestamps.