--- tags: - pyannote - pyannote-audio - pyannote-audio-pipeline - speaker-diarization license: mit language: - en --- # Configuration This model outlines the setup of a fine-tuned speaker diarization model with synthetic medical audio data. Before starting, please ensure the requirements are met: 1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `3.1` with `pip install pyannote.audio` 2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions 3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions 4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens). 5. Download pytorch_model.bin and config.yaml files into your local directory. ## Usage ### Load trained segmentation model ```python import torch from pyannote.audio import Model # Load the original architecture, will need to replace with your own auth token model = Model.from_pretrained("pyannote/segmentation-3.0", use_auth_token=True) # Path to the downloaded pytorch model model_path = "models/pyannote_sd_normal" # Load fine-tuned weights from the pytorch_model.bin file model.load_state_dict(torch.load(model_path + "/pytorch_model.bin")) ``` ### Load fine-tuned speaker diarization pipeline ```python from pyannote.audio import Pipeline from pyannote.metrics.diarization import DiarizationErrorRate from pyannote.audio.pipelines import SpeakerDiarization # Initialize the pyannote pipeline, will need to replace with your own auth token pretrained_pipeline = Pipeline.from_pretrained( "pyannote/speaker-diarization-3.1", use_auth_token=True) finetuned_pipeline = SpeakerDiarization( segmentation=model, embedding=pretrained_pipeline.embedding, embedding_exclude_overlap=pretrained_pipeline.embedding_exclude_overlap, clustering=pretrained_pipeline.klustering, ) # Load fine-tuned params into the pipeline finetuned_pipeline.load_params(model_path + "/config.yaml") ``` ### GPU usage ``` if torch.cuda.is_available(): gpu = torch.device("cuda") finetuned_pipeline.to(gpu) print("gpu: ", torch.cuda.get_device_name(gpu)) ``` ### Visualise diarization output ``` diarization = finetuned_pipeline("path/to/audio.wav") diarization ``` ### View speaker turns, speaker ID, and time ``` for speech_turn, track, speaker in diarization.itertracks(yield_label=True): print(f"{speech_turn.start:4.1f} {speech_turn.end:4.1f} {speaker}") ``` ## Citations ```bibtex @inproceedings{Plaquet23, author={Alexis Plaquet and Hervé Bredin}, title={{Powerset multi-class cross entropy loss for neural speaker diarization}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, } ``` ```bibtex @inproceedings{Bredin23, author={Hervé Bredin}, title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, } ```