# const-me/whisper

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/const-me-whisper).**

10,489 stars · 957 forks · C++ · MPL-2.0

## Links

- GitHub: https://github.com/Const-me/Whisper
- awesome-repositories: https://awesome-repositories.com/repository/const-me-whisper.md

## Description

Whisper is a high-performance speech-to-text inference engine that uses graphics hardware shaders to accelerate the transcription of spoken audio into written text. It implements a GPU-accelerated automatic speech recognition framework specifically designed to run Whisper models.

The system focuses on high-speed processing for both recorded audio files and live microphone streams. It utilizes voice activity detection to analyze raw audio in real time, triggering the inference engine only when human speech is detected.

The engine covers a broad range of capabilities including real-time audio capture, GPGPU inference optimization, and compute performance profiling to measure the execution time of individual shaders.

## Tags

### Artificial Intelligence & ML

- [Automatic Speech Recognition](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/automatic-speech-recognition.md) — Implements a high-performance automatic speech recognition system using OpenAI Whisper to transcribe audio in multiple languages.
- [Audio Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription.md) — Converts recorded audio files into text transcripts using a GPU-accelerated speech recognition model. ([source](https://github.com/const-me/whisper#readme))
- [Real-Time Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/real-time-transcription.md) — Provides instantaneous conversion of live microphone audio streams into text transcripts. ([source](https://github.com/const-me/whisper#readme))
- [GPU-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-accelerated-inference.md) — Leverages the parallel processing power of GPUs specifically to accelerate the inference phase of speech recognition.
- [GPGPU Execution Models](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/engines-runtimes-servers/inference-execution-models/gpgpu-execution-models.md) — Executes neural network computations directly on the graphics processor using compute shaders for high-performance speech recognition.
- [Inference Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-optimization.md) — Optimizes model execution speed and computational efficiency by running heavy machine learning models on graphics hardware.
- [Whisper-Based Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-engines/whisper-based-engines.md) — Provides a high-performance inference engine specifically designed to run Whisper-based speech recognition models.
- [Real-Time Audio Transcribers](https://awesome-repositories.com/f/artificial-intelligence-ml/real-time-audio-transcribers.md) — Captures live microphone input and applies voice activity detection for immediate text generation.
- [Voice Activity Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-activity-detection.md) — Analyzes raw audio streams in real time to identify speech boundaries and trigger the inference engine.

### Graphics & Multimedia

- [Tensor Operations](https://awesome-repositories.com/f/graphics-multimedia/gpu-accelerated-shaders/tensor-operations.md) — Implements linear algebra and matrix multiplications within GPU kernels to eliminate CPU-to-GPU data transfer bottlenecks.
- [Audio Capture and Playback](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-capture-and-playback.md) — Captures audio input from hardware devices for real-time processing and speech activity detection. ([source](https://github.com/const-me/whisper#readme))

### Data & Databases

- [Graphics Memory Mapping](https://awesome-repositories.com/f/data-databases/data-access-querying/memory-mapped-file-access/graphics-memory-mapping.md) — Directly maps graphics memory into the application address space to enable high-speed audio data transfer to the GPU.

### Part of an Awesome List

- [Speech Recognition](https://awesome-repositories.com/f/awesome-lists/media/speech-recognition.md) — Windows desktop application utilizing GPU acceleration for transcription.