# collabora/whisperlive

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/collabora-whisperlive).**

3,819 stars · 526 forks · Python · mit

## Links

- GitHub: https://github.com/collabora/WhisperLive
- awesome-repositories: https://awesome-repositories.com/repository/collabora-whisperlive.md

## Topics

`dictation` `obs` `openai` `openvino` `openvino-intel` `tensorrt` `tensorrt-llm` `text-to-speech` `translation` `voice-recognition` `whisper` `whisper-tensorrt`

## Description

WhisperLive is a real-time speech-to-text server that converts live audio streams into text using Whisper models. It functions as a backend service that receives microphone input via WebSockets and provides incremental transcriptions with word-level timestamps.

The system utilizes a GPU-accelerated inference engine and a keyword-boosted transcription API to improve the recognition accuracy of domain-specific jargon, acronyms, and product names. It also includes a speaker diarization tool that clusters audio embeddings to identify and label different participants within a recording.

Additional capabilities include high-throughput audio processing via batch inference and TensorRT acceleration, as well as audio signal normalization and recording state control. The service supports live audio captioning through segment-based incremental rendering.

## Tags

### Artificial Intelligence & ML

- [Real-Time Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/real-time-transcription.md) — Converts live audio streams into text in real time using Whisper models for immediate accessibility.
- [Audio Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription.md) — Provides a backend service that streams microphone input and delivers incremental text transcriptions.
- [GPU-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-accelerated-inference.md) — Employs a GPU-accelerated inference engine to optimize throughput for multilingual speech recognition.
- [Real-Time Speech-to-Text Servers](https://awesome-repositories.com/f/artificial-intelligence-ml/real-time-speech-to-text-servers.md) — Functions as a real-time audio transcription server using Whisper models and WebSocket streaming.
- [Speaker Diarization](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization.md) — Clusters audio feature vectors to distinguish and segment different speakers within a single audio stream.
- [Whisper-Based Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-engines/whisper-based-engines.md) — Utilizes a Faster-Whisper engine with CTranslate2 backend to optimize transcription speed and memory usage.
- [High-Throughput Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/high-throughput-transcription.md) — Processes multiple simultaneous audio streams via GPU batching to achieve high transcription throughput.
- [Word-Level Timestamps](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/word-level-timestamps.md) — Produces precise start and end timestamps and confidence scores for every individual word transcribed. ([source](https://cdn.jsdelivr.net/gh/collabora/whisperlive@main/README.md))
- [Inference Acceleration Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-acceleration-engines.md) — Implements high-performance inference using TensorRT to accelerate speech-to-text processing speeds. ([source](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md))
- [Incremental Transcription Previews](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-transcription/incremental-transcription-previews.md) — Incrementally renders transcribed text segments on the screen as they are emitted by the backend. ([source](https://github.com/collabora/WhisperLive/blob/main/Audio-Transcription-iOS/README.md))
- [Technical Jargon Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-transcription/medical-domain/technical-jargon-optimizations.md) — Improves transcription accuracy for domain-specific technical terms using keyword boosting.

### Networking & Communication

- [Audio Transcription WebSockets](https://awesome-repositories.com/f/networking-communication/websocket-to-stream-bridges/audio-transcription-websockets.md) — Uses WebSockets to stream raw PCM audio from the client to the server for real-time processing.

### Data & Databases

- [Inference Batching](https://awesome-repositories.com/f/data-databases/request-batching/inference-batching.md) — Groups multiple concurrent user audio segments into single GPU calls to maximize system throughput.
- [Transcription Term Boosts](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/query-interfaces-dsls/multi-term-search-processors/term-weighting-algorithms/transcription-term-boosts.md) — Provides a mechanism to boost specific technical terms and jargon during the transcription decoding process.

### Graphics & Multimedia

- [Live Captioning Integrations](https://awesome-repositories.com/f/graphics-multimedia/live-captioning-integrations.md) — Displays incrementally processed speech as text on screen for real-time live captioning.

### User Interface & Experience

- [Incremental Text Rendering](https://awesome-repositories.com/f/user-interface-experience/incremental-text-rendering.md) — Updates the user interface incrementally by appending transcribed text chunks as they are emitted.
