nvidia
/

stt_ru_conformer_transducer_large

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Community

sasha-meister commited on Oct 24, 2022

Commit

4e152c9

•

1 Parent(s): 3e338bb

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -159,6 +159,7 @@ The NeMo toolkit [3] was used for training the models for over several hundred e
 The vocabulary we use contains 33 characters:
 ```python
 [' ', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я']```
 Rare symbols with diacritics were replaced during preprocessing.
 The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).

 The vocabulary we use contains 33 characters:
 ```python
 [' ', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я']```
 Rare symbols with diacritics were replaced during preprocessing.
 The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).