Update README.md
Browse files
README.md
CHANGED
@@ -80,6 +80,9 @@ With streaming, the results with different chunk sizes on test-clean are the fol
|
|
80 |
This ASR system is a Conformer model trained with the RNN-T loss (with an auxiliary CTC loss to stabilize training). The model operates with a unigram tokenizer.
|
81 |
Architecture details are described in the [training hyperparameters file](https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml).
|
82 |
|
|
|
|
|
|
|
83 |
The system is trained with recordings sampled at 16kHz (single channel).
|
84 |
The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling `transcribe_file` if needed.
|
85 |
|
|
|
80 |
This ASR system is a Conformer model trained with the RNN-T loss (with an auxiliary CTC loss to stabilize training). The model operates with a unigram tokenizer.
|
81 |
Architecture details are described in the [training hyperparameters file](https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml).
|
82 |
|
83 |
+
Streaming support makes use of Dynamic Chunk Training. Chunked attention is used for the multi-head attention module, and an implementation of [Dynamic Chunk Convolutions](https://www.amazon.science/publications/dynamic-chunk-convolution-for-unified-streaming-and-non-streaming-conformer-asr) were used.
|
84 |
+
The model was trained with support for different chunk sizes (and even full context), and so is suitable for various chunk sizes and offline transcription.
|
85 |
+
|
86 |
The system is trained with recordings sampled at 16kHz (single channel).
|
87 |
The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling `transcribe_file` if needed.
|
88 |
|