FunClip

FunClip is an open-source tool that transcribes speech from video files and clips segments based on text, speaker, or AI analysis. It combines speech recognition with speaker diarization, audio event detection, and visual content understanding to identify and extract relevant portions of a video.

The tool distinguishes itself through several integrated capabilities. It supports hotword-weighted speech recognition, which improves transcription accuracy for specific terms like names or jargon by boosting their probability during decoding. A large language model can interpret the transcribed text to automatically select video segments based on natural language prompts. Speaker diarization separates and labels audio segments by speaker identity, enabling clipping by a chosen speaker. Additionally, a visual-content understanding model analyzes video frames to select clips when the transcript alone is insufficient.

Beyond these core differentiators, FunClip generates SRT subtitle files for both the full video and each clipped segment. It provides a command-line interface for headless, scriptable execution of the entire recognition and clipping pipeline, as well as a web service interface accessible locally or over a network for browser-based use.

Features

Video Clip Extraction - An open-source tool that transcribes video speech and clips segments by text, speaker, or AI analysis.

Hotword Boosts - Improves speech recognition accuracy for specific terms by marking them as hotwords.

Speaker Diarizers - Separates and labels audio segments by speaker identity using clustering of voice embeddings.

Speaker-Based Video Clippers - Identify speakers in a video using speaker recognition and clip segments belonging to a chosen speaker.

Automated Video Transcribers - Transcribing speech from video files into text with accurate word-level timestamps for downstream processing.

Transcription-Based Video Clippers - Transcribe a video's speech into text and clip segments matching text phrases you specify.

Transcription Term Boosts - Improve transcription accuracy for specific terms like names or entities by marking them as hotwords.

Command-Line Video Clippers - Run the recognition and clipping workflow directly from a terminal for automated or scripted use.

AI-Assisted Clip Selectors - Using large language models to analyze transcripts and automatically select relevant video segments based on user prompts.

AI-Assisted Clips - Use a large language model to analyze a transcript and automatically select clip segments based on your prompt.

Text-Based Video Clippers - Extract video segments whose transcribed text matches words or phrases you specify.

Transcript-Based Video Clippers - Extract video segments that match text phrases you select from the transcription results.

Visual-Content Clips - Selecting video segments by analyzing both visual content and audio when transcript alone is insufficient.

LLM-Based Transcript Selectors - Uses a large language model to interpret transcribed text and select relevant video segments based on natural language prompts.

LLM-Based Video Clippers - Use a large language model to interpret the transcript and automatically pick relevant video segments.

Visual-Content Video Clippers - Select video segments by analyzing both visuals and audio using a video understanding model.

Hotword-Weighted Recognizers - Improving speech recognition accuracy for specific terms like names or jargon by providing a custom hotword list.

Frame-Level Video Analyzers - Analyzes video frames with a vision model to select clips when transcript alone is insufficient.

Subtitle Generators - Produce SRT subtitle files for the full video and for the clipped segments during the clipping process.

Video Subtitle Generators - Produces SRT subtitle files for both the full video and each clipped segment during the processing workflow.

Local Web Interfaces - Exposes the clipping functionality through a browser-based UI that can be accessed locally or over a network.

modelscopeFunClip

Features

Star history