Pyannote.audio is a PyTorch toolkit for speaker diarization, speaker identification, and speech activity detection. Its primary purpose is to partition audio recordings into segments and assign each segment to a specific speaker identity to determine who spoke when.
The project includes a framework for classifying speaker identities and a pipeline for distinguishing human speech from background noise. It provides specialized tools for handling symmetric-overlap speech, where multiple speakers talk simultaneously, and employs learnable band-pass filters for raw waveform feature extraction.
The toolkit features a comprehensive evaluation suite for measuring diarization error rates, speaker identification precision, and the accuracy of speaker boundaries. It also includes visualization utilities for generating detection error trade-off curves and precision-recall plots to analyze binary classification performance.