speechbrain
/

asr-streaming-conformer-librispeech

Automatic Speech Recognition

Model card Files Files and versions Community

sdelangen commited on Feb 26

Commit

320f41b

•

1 Parent(s): d0f351e

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -80,6 +80,9 @@ With streaming, the results with different chunk sizes on test-clean are the fol
 This ASR system is a Conformer model trained with the RNN-T loss (with an auxiliary CTC loss to stabilize training). The model operates with a unigram tokenizer.
 Architecture details are described in the [training hyperparameters file](https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml).
 The system is trained with recordings sampled at 16kHz (single channel).
 The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling `transcribe_file` if needed.

 This ASR system is a Conformer model trained with the RNN-T loss (with an auxiliary CTC loss to stabilize training). The model operates with a unigram tokenizer.
 Architecture details are described in the [training hyperparameters file](https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml).
+Streaming support makes use of Dynamic Chunk Training. Chunked attention is used for the multi-head attention module, and an implementation of [Dynamic Chunk Convolutions](https://www.amazon.science/publications/dynamic-chunk-convolution-for-unified-streaming-and-non-streaming-conformer-asr) were used.
+The model was trained with support for different chunk sizes (and even full context), and so is suitable for various chunk sizes and offline transcription.
 The system is trained with recordings sampled at 16kHz (single channel).
 The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling `transcribe_file` if needed.