30 open-source projects similar to modelscope/funclip, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best FunClip alternative.
This project is an AI-driven suite of tools designed to repurpose long-form video content into short-form clips. It integrates a speech-to-text engine for automated transcription, a highlighting system that ranks engaging segments based on emotional hooks, and a video processor that converts horizontal footage into vertical formats. The system distinguishes itself through intelligent video cropping that utilizes face tracking and motion smoothing to keep subjects centered. It also employs an analysis system to extract viral highlights by scoring segments for engagement and practical value. T
This is a Windows application for automatic speech recognition that transcribes spoken audio from video files into timestamped SRT subtitle files. It serves as a subtitle generator and translation tool that converts media speech into synchronized text. The software functions as a batch media transcriber, allowing the simultaneous processing of multiple audio and video files to generate subtitles in bulk. It includes a translation workflow to convert transcriptions between different languages for the creation of bilingual or localized files. The system also provides text refinement capabiliti
Gifify is a tool for converting video files into optimized animated GIFs. It functions as a video to GIF converter and optimization utility that extracts specific clips from video files and burns text or subtitle overlays directly into the frames. The project differentiates itself through specialized GIF optimization, using lossy compression, color count limiting, and custom color palette generation to reduce file sizes. It also provides precise control over the output by allowing users to adjust playback speed, reverse playback direction, and resize dimensions. The software covers a broad s
Bilive is a multimodal AI video pipeline and live stream recording tool designed to capture real-time broadcasts and automate the creation of highlight clips. It functions as a multi-platform stream orchestrator capable of distributing looped pre-recorded content and managing the automated upload of processed video clips to various destinations. The system distinguishes itself through AI-driven content generation, using comment density to detect high-energy segments and multimodal models to automatically produce descriptive titles and synchronized subtitles. It further utilizes image-to-image
PySceneDetect is a suite of tools for identifying cuts and transitions in video files using content, threshold, and histogram detection algorithms. It functions as a scene detector, frame extractor, statistics analyzer, metadata exporter, and video scene splitter. The project identifies scene boundaries and can divide video files into smaller clips using external processing tools. It allows for the extraction of representative image frames from detected changes and the export of scene lists into industry-standard formats such as EDL, FCP, HTML, OTIO, and CSV. The toolset includes capabilitie
Omniparse is a multimodal content parser and generative AI ingestion engine designed to convert documents, images, and multimedia into a uniform format. It functions as a data preprocessing pipeline that transforms diverse raw data sources into structured markdown to improve the performance of large language model workflows. The system extracts text and structural data from PDFs, images, audio, and video files. It includes a web crawler that converts dynamic website content into clean markdown and a multimodal transformation process that maps disparate input formats into a unified data schema
This project is an AI video post-production suite that uses large language models and programmatic tools to automate editing, transcription, and subtitle generation. It functions as an AI editing agent that translates natural language instructions into shell commands, providing a programmatic interface for manipulating media via FFmpeg. The toolkit includes a motion graphics engine that generates technical animations and visual overlays through code-driven rendering and mathematical definitions. It distinguishes itself by combining an AI-powered transcriber for word-level timestamps with an a
SmartSub is a cross-platform desktop application for AI-driven video transcription and subtitle generation. It converts audio and video files into text subtitles using local AI models and incorporates hardware acceleration to increase processing speed. The tool features a subtitle translator that leverages large language models, such as OpenAI and DeepSeek, to convert subtitles between different languages. It includes a visual editor for proofreading and polishing transcribed text, paired with a video preview for frame-accurate synchronization. The software supports batch processing of multi
This Python SDK provides a comprehensive toolkit for synthetic audio generation, voice cloning, and the development of conversational AI agents. It enables the creation of lifelike spoken audio from text, the replication of human voices through custom cloning, and the deployment of real-time voice agents capable of interacting with external large language models. The library distinguishes itself through deep integration of conversational AI capabilities, including the design of agent personas and the execution of real-time actions via APIs. It supports professional-grade audio production thro
Autocut is a text-based video editor and automatic speech recognition tool. It allows users to cut and merge video clips by modifying a text transcript instead of using a traditional timeline. The system operates as an FFmpeg video processor and subtitle manipulation utility. It converts spoken audio into text and compacts subtitle files into simplified formats, enabling the removal of unwanted video segments by deleting corresponding sentences from a transcription file. The project covers automated video transcription, non-linear video cutting, and subtitle file management. It supports hard
NarratoAI is an automated video production pipeline that uses large language models to generate scripts, voiceovers, and edited video commentary. It functions as a combined scriptwriter, voiceover generator, and video editor to streamline the creation of movie and television commentary content. The system automates the production workflow by converting input data into structured narrative scripts, synthesizing artificial speech for narration, and programmatically assembling video clips based on script timestamps. It also converts spoken audio from video files into written text for subtitles a
VideoCaptioner is an automated tool designed to generate and embed time-synchronized subtitles into video files. By leveraging speech recognition models, the software converts spoken audio into text and calculates precise timestamps to ensure captions align with the original media. The project operates as a local-first inference pipeline, performing all transcription tasks on the host machine to maintain data privacy. It utilizes a transformer-based neural network for speech recognition and integrates a multimedia framework to handle the technical aspects of video processing and subtitle stre
PHP-FFmpeg is an object-oriented wrapper for executing FFmpeg binary commands within PHP applications. It serves as a multimedia processing library and toolkit for transcoding, clipping, merging, and filtering audio and video files through a standardized programmatic interface. The project provides specialized drivers for video manipulation, audio editing, and media metadata extraction. These drivers allow for the application of visual filters, the modification of audio sample rates, and the probing of multimedia files to retrieve technical specifications and validate file integrity. The lib
mmaction2 is a PyTorch video understanding toolbox designed for training and evaluating deep learning models. It serves as a framework for action recognition, temporal localization, and spatio-temporal action detection, providing specialized tools for both pixel-based video analysis and skeleton-based action recognition. The project distinguishes itself through a modular architecture featuring registry-based component discovery and hierarchical, config-driven model assembly. It supports multi-modal feature fusion, integrating RGB frames, optical flow, and audio, and includes capabilities for
Kreuzberg is a document extraction engine that converts PDFs, Office files, images, and over 90 other formats into clean, structured text and metadata. It is built around a compiled Rust core that can be used as a native library, a command-line tool, a REST API server, or a WebAssembly module for browser-based processing. The system is designed to run entirely on self-hosted infrastructure, with no data leaving the user's environment. What distinguishes Kreuzberg is its breadth of integration surfaces and its pipeline architecture. It exposes extraction capabilities through native bindings fo
Deskhop is a DIY hardware KVM switch and USB HID input router. It provides the PCB layouts and firmware necessary to build a physical device that routes keyboard and mouse inputs between two computers. The project features a web-based configuration interface delivered via a virtual USB mass storage device, allowing users to manage hardware settings and calibration through a browser using WebHID. It supports seamless transitions between workstations via edge-of-screen cursor movement or hotkeys, including coordinate-mapped scaling to maintain vertical pointer position across monitors of differ
mlx-audio is an audio processing toolkit built on Apple MLX that provides speech transcription, text-to-speech synthesis, voice cloning, and audio source separation using local models. It offers an OpenAI-compatible REST API and web interface for running audio generation and transcription tasks, enabling drop-in integration with existing tools that follow that endpoint structure. The toolkit supports text-prompted audio source separation, allowing specific sounds to be isolated from mixed recordings based on natural language descriptions. It also provides voice cloning from a short reference
ACE Step 1.5 is a local text-to-music generation and audio editing system that runs on consumer hardware. It transforms plain-language descriptions into full-length songs with lyrics, and can edit existing audio through cover generation, vocal removal, track separation, and selective repainting. The system supports multilingual prompts and lyrics in over 50 languages, and provides precise control over musical structure including duration, BPM, key, and time signature. The project distinguishes itself through a dual-stream diffusion architecture that processes separate latent streams for vocal
WhisperLive is a real-time speech-to-text server that converts live audio streams into text using Whisper models. It functions as a backend service that receives microphone input via WebSockets and provides incremental transcriptions with word-level timestamps. The system utilizes a GPU-accelerated inference engine and a keyword-boosted transcription API to improve the recognition accuracy of domain-specific jargon, acronyms, and product names. It also includes a speaker diarization tool that clusters audio embeddings to identify and label different participants within a recording. Additiona
This project is a hardware-accelerated transcription server and offline subtitle generator. It functions as a speech-to-text tool that converts audio and video files into plain text, JSON, and SRT subtitle formats using the Whisper model. The system operates as an OpenAI Audio API emulator, providing a local server that mimics a specific audio interface. This allows it to serve transcriptions to existing client configurations without requiring changes to the client software. The service utilizes GPU acceleration to increase voice recognition speed and includes utilities for hardware detectio
This project is a comprehensive toolkit for on-device speech recognition, synthesis, and audio processing, specifically engineered for Apple Silicon. It provides a framework for building real-time, full-duplex voice agents that operate entirely offline, leveraging native hardware acceleration to maintain performance and privacy. By utilizing optimized machine learning models, the library enables local execution of complex audio tasks without reliance on external cloud services. The library distinguishes itself through its specialized focus on local, high-performance voice interaction. It incl
Buku is a personal bookmark manager that provides a command line interface, a portable bookmark database, and a self-hosted server for organizing web links. It functions as a command line knowledge base for saving, tagging, and searching web resources. The system features a portable, mergeable database that supports AES-256 encryption and is designed for cross-device data synchronization. It includes a RESTful API and a self-hosted web interface, allowing users to manage their collection via a browser or programmatic requests. Capabilities include automatic metadata fetching to populate page
SD.Next is an all-in-one web interface and multi-backend inference engine for generating, editing, and processing images and videos using diffusion models. It functions as a comprehensive tool for diffusion model management and an automated image processing pipeline for bulk operations. The project is distinguished by its hardware-backend abstraction layer, which provides automatic detection and acceleration for NVIDIA CUDA, AMD ROCm, Intel OpenVINO, and DirectML. It features a headless generative API and a programmatic command interface, allowing users to trigger tasks via REST API or CLI wi
Valetudo is a custom firmware project for robot vacuums that enables local-only control, removing dependencies on cloud servers and protecting user privacy. It replaces proprietary vendor binaries with open source software to ensure that data, including floor plans and images, is not uploaded to external clouds. The project distinguishes itself by providing a rooted firmware installation process that prevents forced vendor updates. It implements a standardized control interface across different hardware brands and utilizes an MQTT-based message bus to facilitate integration with open source a
Kdenlive is an open-source non-linear video editing suite designed for digital video post-production. Built on the MLT Framework and utilizing KDE Frameworks for its user interface, it provides a multi-track environment for assembling clips, applying transitions, and rendering final video files. The editor distinguishes itself through a comprehensive set of animation and effect tools, including keyframe-based parameter animation with a visual curve editor for fine-tuning transitions. It supports advanced visual modifications such as clip speed remapping, effect region masking, and the integra
Mailpile is an encrypted email client and high-volume mail indexer that provides a web-based portal for managing electronic mail. It functions as a private email management system designed to protect user privacy and data control. The project features a search engine optimized for indexing and retrieving large volumes of email on consumer hardware. It includes a Bayesian email filter for automated message classification and tagging. The system supports secure communication through the integration of public-key encryption and signing for sending and receiving messages. Additional capabilities
WWDC is a native macOS video player and conference session manager designed for streaming and organizing developer conference videos. It functions as a video transcription browser and annotation tool, allowing users to track viewing progress and organize technical sessions into personalized learning paths. The application enables navigation through videos via searchable, multi-language text transcripts. Users can create searchable reference points by annotating specific video timestamps with custom notes and distribute content by sharing session links or extracting short video clips. The sys
pyAudioAnalysis is a Python library and framework for audio signal processing and analysis. It provides tools for extracting mathematical representations of sound, such as spectrograms, and implements a system for training and evaluating machine learning models to classify audio segments based on acoustic patterns. The project includes dedicated utilities for audio segmentation, which allow for the removal of silence and the detection of specific audio events to divide recordings into meaningful sections. It also provides data visualization capabilities that use dimensionality reduction to ma