Whishper

Whishper is a graphical user interface for transcribing audio and video files into text using the Whisper model. It serves as a speech-to-text tool and subtitle file generator that converts spoken content into editable text and timed subtitle formats.

The project features an integrated transcription and translation interface, allowing users to refine automated results and convert transcribed text into different languages. It includes a visual editor for correcting speech recognition errors, adjusting segment timecodes, and performing bilingual translation reviews.

The system handles the full transcription workflow, from retrieving media via remote URLs to exporting final data. Supported export formats include SRT, VTT, JSON, and plain text.

Features

Graphical User Interfaces - Provides a comprehensive graphical user interface for transcribing audio and video files using the Whisper model.

Speech to Text Transcription - Converts spoken content from uploaded or remote media files into text transcripts and subtitles.

Audio and Video File Transcription - Extracts speech from local media files to produce offline subtitles, plain text, and timestamp data.

Transcription Exporters - Saves transcriptions and subtitles into various formats such as plain text, JSON, VTT, and SRT.

Transcript Editors - Provides a visual editor for refining transcription text with segment splitting and timing adjustments.

Subtitle Segment Management - Enables precise control over the flow of subtitles by modifying individual text segment timecodes.

Language Translation Services - Translates transcribed text into multiple languages to improve accessibility of audio and video content.

Speech Recognition Engines - Uses an optimized local inference engine to convert audio to text while maintaining data privacy.

Transcript Refinement - Provides tools to correct speech recognition errors and refine segment timing for accurate transcripts.

Multi-Format Exports - Exports internal transcription data into multiple standardized formats including SRT, VTT, and JSON.

Timestamped Subtitle Generators - Generates timestamped subtitle files in SRT and VTT formats for video playback.

Media Text Digitization - Turns spoken content from uploaded files or remote URLs into editable and searchable text documents.

Time-Coded Segment Mapping - Organizes transcribed text into time-coded segments to synchronize subtitles with the audio track.

AI Translation Tools - Implements a workflow to convert transcribed spoken content into different languages using integrated translation engines.

Translation Editors - Offers an interface for reviewing and correcting automated translations against the original transcription.

Remote Media Fetching - Fetches audio and video files from remote URLs into a local buffer for processing.

Visual State Reconciliation - Implements real-time synchronization between the visual transcription editor and the underlying data state.

Bilingual Display Components - Provides a UI component that displays original and translated text segments side-by-side for linguistic verification.

plujawhishper

Features

Star history