# pyannote/pyannote-audio

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/pyannote-pyannote-audio).**

9,203 stars · 1,015 forks · Jupyter Notebook · mit

## Links

- GitHub: https://github.com/pyannote/pyannote-audio
- Homepage: http://pyannote.github.io
- awesome-repositories: https://awesome-repositories.com/repository/pyannote-pyannote-audio.md

## Topics

`overlapped-speech-detection` `pretrained-models` `pytorch` `speaker-change-detection` `speaker-diarization` `speaker-embedding` `speaker-recognition` `speaker-verification` `speech-activity-detection` `speech-processing` `voice-activity-detection`

## Description

Pyannote.audio is a PyTorch toolkit for speaker diarization, speaker identification, and speech activity detection. Its primary purpose is to partition audio recordings into segments and assign each segment to a specific speaker identity to determine who spoke when.

The project includes a framework for classifying speaker identities and a pipeline for distinguishing human speech from background noise. It provides specialized tools for handling symmetric-overlap speech, where multiple speakers talk simultaneously, and employs learnable band-pass filters for raw waveform feature extraction.

The toolkit features a comprehensive evaluation suite for measuring diarization error rates, speaker identification precision, and the accuracy of speaker boundaries. It also includes visualization utilities for generating detection error trade-off curves and precision-recall plots to analyze binary classification performance.

## Tags

### Artificial Intelligence & ML

- [Speaker Diarization](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization.md) — Partitions audio recordings into segments and assigns each to a specific speaker identity to determine who spoke when. ([source](https://cdn.jsdelivr.net/gh/pyannote/pyannote-audio@main/README.md))
- [Diarization Evaluation Suites](https://awesome-repositories.com/f/artificial-intelligence-ml/model-evaluation-suites/diarization-evaluation-suites.md) — Provides a comprehensive suite of metrics for computing diarization error rates and speaker boundary precision.
- [Neural Network Implementations](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-implementations.md) — Provides a PyTorch-based neural architecture for extracting audio features and classifying speaker identities.
- [Diarization](https://awesome-repositories.com/f/artificial-intelligence-ml/prediction-visualization/accuracy-calculators/error-metrics/diarization.md) — Computes the overall diarization error rate by measuring false alarms, missed detections, and speaker confusion. ([source](http://pyannote.github.io/pyannote-metrics/reference.html))
- [Speaker Identification Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-identification-frameworks.md) — Provides a system for classifying speaker identities and measuring accuracy within supervised audio datasets.
- [Identification Accuracy Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-identification-frameworks/identification-accuracy-metrics.md) — Measures the precision and recall of supervised classification models to identify specific individuals.
- [Identification Error Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-identification-frameworks/identification-error-metrics.md) — Determines the precision and recall of supervised speaker classification using identification error rates. ([source](http://pyannote.github.io/pyannote-metrics/reference.html))
- [Speech Activity Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-activity-detection.md) — Implements a pipeline for distinguishing human speech from background noise through binary classification. ([source](http://pyannote.github.io/pyannote-metrics/reference.html))
- [Classification Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/classification-metrics.md) — Evaluates model performance using error rates and precision-recall plots to analyze voice classification.
- [Inference Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-strategies.md) — Implements a sliding-window inference mechanism to process long audio files for local speaker predictions.
- [Overlap Speech Handling](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/speech-translation-systems/simultaneous-speech-to-speech-translation/overlap-speech-handling.md) — Provides specialized tools to detect and manage segments where multiple speakers talk simultaneously.
- [Classification Error Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/prediction-visualization/accuracy-calculators/error-metrics/classification-error-analysis.md) — Breaks down speaker identification mistakes by labeling segments as correct, confused, missed, or false alarms. ([source](http://pyannote.github.io/pyannote-metrics/reference.html))
- [Boundary Accuracy Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/boundary-accuracy-metrics.md) — Provides specialized metrics to evaluate the precision and purity of detected speaker boundaries in diarization tasks. ([source](http://pyannote.github.io/pyannote-metrics/reference.html))
- [Clustering Algorithms](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/clustering-algorithms.md) — Implements clustering-based algorithms to group similar audio embeddings into distinct speaker identities.

### Part of an Awesome List

- [Voice Activity Detection](https://awesome-repositories.com/f/awesome-lists/more/speech-and-audio-processing/voice-activity-detection.md) — Distinguishes human speech from background noise and non-speech audio to isolate active speaking segments.
- [Speech Processing](https://awesome-repositories.com/f/awesome-lists/ai/speech-processing.md) — Neural building blocks for speaker diarization.

### Graphics & Multimedia

- [Segmentation Evaluation](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/audio-analysis-synthesis/audio-segmentation-utilities/segmentation-evaluation.md) — Assesses the accuracy of detected speaker boundaries to determine how precisely speech turns are divided.
- [Audio Feature Extraction](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/audio-analysis-synthesis/audio-feature-extraction.md) — Employs learnable band-pass filters via SincNet for advanced audio feature extraction from raw waveforms.
