sasha-meister
commited on
Commit
•
4e152c9
1
Parent(s):
3e338bb
Update README.md
Browse files
README.md
CHANGED
@@ -159,6 +159,7 @@ The NeMo toolkit [3] was used for training the models for over several hundred e
|
|
159 |
The vocabulary we use contains 33 characters:
|
160 |
```python
|
161 |
[' ', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я']```
|
|
|
162 |
Rare symbols with diacritics were replaced during preprocessing.
|
163 |
|
164 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
|
|
159 |
The vocabulary we use contains 33 characters:
|
160 |
```python
|
161 |
[' ', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я']```
|
162 |
+
|
163 |
Rare symbols with diacritics were replaced during preprocessing.
|
164 |
|
165 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|