File size: 10,731 Bytes
3d735e1 322afd6 3d735e1 ed04ff9 3d735e1 322afd6 3d735e1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
---
tags:
- audio
- text-to-speech
- onnx
inference: false
language: en
datasets:
- CSTR-Edinburgh/vctk
license: apache-2.0
library_name: txtai
---
# ESPnet VITS Text-to-Speech (TTS) Model for ONNX
[espnet/kan-bayashi_vctk_vits](https://huggingface.co/espnet/kan-bayashi_vctk_tts_train_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave) exported to ONNX. This model is an ONNX export using the [espnet_onnx](https://github.com/espnet/espnet_onnx) library.
## Usage with txtai
[txtai](https://github.com/neuml/txtai) has a built in Text to Speech (TTS) pipeline that makes using this model easy.
_Note the following example requires txtai >= 7.5_
```python
import soundfile as sf
from txtai.pipeline import TextToSpeech
# Build pipeline
tts = TextToSpeech("NeuML/vctk-vits-onnx")
# Generate speech with speaker id
speech, rate = tts("Say something here", speaker=15)
# Write to file
sf.write("out.wav", speech, rate)
```
## Usage with ONNX
This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with [ttstokenizer](https://github.com/neuml/ttstokenizer).
Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.
```python
import numpy as np
import onnxruntime
import soundfile as sf
import yaml
from ttstokenizer import TTSTokenizer
# This example assumes the files have been downloaded locally
with open("vctk-vits-onnx/config.yaml", "r", encoding="utf-8") as f:
config = yaml.safe_load(f)
# Create model
model = onnxruntime.InferenceSession(
"vctk-vits-onnx/model.onnx",
providers=["CPUExecutionProvider"]
)
# Create tokenizer
tokenizer = TTSTokenizer(config["token"]["list"])
# Tokenize inputs
inputs = tokenizer("Say something here")
# Generate speech
outputs = model.run(None, {"text": inputs, "sids": np.array([15])})
# Write to file
sf.write("out.wav", outputs[0], 22050)
```
## How to export
More information on how to export ESPnet models to ONNX can be [found here](https://github.com/espnet/espnet_onnx#text2speech-inference).
## Speaker reference
The [CSTR VCTK Corpus](https://datashare.ed.ac.uk/handle/10283/3443) includes speech data uttered by native speakers of English with various accents.
When using this model, set a `speaker` id from the reference table below. The `ref` column corresponds to the id in the VCTK dataset.
| SPEAKER | REF | AGE | GENDER | ACCENTS | REGION |
|----------:|-----:|------:|:---------|:---------------|:-----------------|
| 1 | 225 | 23 | F | English | Southern England |
| 2 | 226 | 22 | M | English | Surrey |
| 3 | 227 | 38 | M | English | Cumbria |
| 4 | 228 | 22 | F | English | Southern England |
| 5 | 229 | 23 | F | English | Southern England |
| 6 | 230 | 22 | F | English | Stockton-on-tees |
| 7 | 231 | 23 | F | English | Southern England |
| 8 | 232 | 23 | M | English | Southern England |
| 9 | 233 | 23 | F | English | Staffordshire |
| 10 | 234 | 22 | F | Scottish | West Dumfries |
| 11 | 236 | 23 | F | English | Manchester |
| 12 | 237 | 22 | M | Scottish | Fife |
| 13 | 238 | 22 | F | Northern Irish | Belfast |
| 14 | 239 | 22 | F | English | SW England |
| 15 | 240 | 21 | F | English | Southern England |
| 16 | 241 | 21 | M | Scottish | Perth |
| 17 | 243 | 22 | M | English | London |
| 18 | 244 | 22 | F | English | Manchester |
| 19 | 245 | 25 | M | Irish | Dublin |
| 20 | 246 | 22 | M | Scottish | Selkirk |
| 21 | 247 | 22 | M | Scottish | Argyll |
| 22 | 248 | 23 | F | Indian | |
| 23 | 249 | 22 | F | Scottish | Aberdeen |
| 24 | 250 | 22 | F | English | SE England |
| 25 | 251 | 26 | M | Indian | |
| 26 | 252 | 22 | M | Scottish | Edinburgh |
| 27 | 253 | 22 | F | Welsh | Cardiff |
| 28 | 254 | 21 | M | English | Surrey |
| 29 | 255 | 19 | M | Scottish | Galloway |
| 30 | 256 | 24 | M | English | Birmingham |
| 31 | 257 | 24 | F | English | Southern England |
| 32 | 258 | 22 | M | English | Southern England |
| 33 | 259 | 23 | M | English | Nottingham |
| 34 | 260 | 21 | M | Scottish | Orkney |
| 35 | 261 | 26 | F | Northern Irish | Belfast |
| 36 | 262 | 23 | F | Scottish | Edinburgh |
| 37 | 263 | 22 | M | Scottish | Aberdeen |
| 38 | 264 | 23 | F | Scottish | West Lothian |
| 39 | 265 | 23 | F | Scottish | Ross |
| 40 | 266 | 22 | F | Irish | Athlone |
| 41 | 267 | 23 | F | English | Yorkshire |
| 42 | 268 | 23 | F | English | Southern England |
| 43 | 269 | 20 | F | English | Newcastle |
| 44 | 270 | 21 | M | English | Yorkshire |
| 45 | 271 | 19 | M | Scottish | Fife |
| 46 | 272 | 23 | M | Scottish | Edinburgh |
| 47 | 273 | 23 | M | English | Suffolk |
| 48 | 274 | 22 | M | English | Essex |
| 49 | 275 | 23 | M | Scottish | Midlothian |
| 50 | 276 | 24 | F | English | Oxford |
| 51 | 277 | 23 | F | English | NE England |
| 52 | 278 | 22 | M | English | Cheshire |
| 53 | 279 | 23 | M | English | Leicester |
| 54 | 280 | | | Unknown | |
| 55 | 281 | 29 | M | Scottish | Edinburgh |
| 56 | 282 | 23 | F | English | Newcastle |
| 57 | 283 | 24 | F | Irish | Cork |
| 58 | 284 | 20 | M | Scottish | Fife |
| 59 | 285 | 21 | M | Scottish | Edinburgh |
| 60 | 286 | 23 | M | English | Newcastle |
| 61 | 287 | 23 | M | English | York |
| 62 | 288 | 22 | F | Irish | Dublin |
| 63 | 292 | 23 | M | Northern Irish | Belfast |
| 64 | 293 | 22 | F | Northern Irish | Belfast |
| 65 | 294 | 33 | F | American | San Francisco |
| 66 | 295 | 23 | F | Irish | Dublin |
| 67 | 297 | 20 | F | American | New York |
| 68 | 298 | 19 | M | Irish | Tipperary |
| 69 | 299 | 25 | F | American | California |
| 70 | 300 | 23 | F | American | California |
| 71 | 301 | 23 | F | American | North Carolina |
| 72 | 302 | 20 | M | Canadian | Montreal |
| 73 | 303 | 24 | F | Canadian | Toronto |
| 74 | 304 | 22 | M | Northern Irish | Belfast |
| 75 | 305 | 19 | F | American | Philadelphia |
| 76 | 306 | 21 | F | American | New York |
| 77 | 307 | 23 | F | Canadian | Ontario |
| 78 | 308 | 18 | F | American | Alabama |
| 79 | 310 | 21 | F | American | Tennessee |
| 80 | 311 | 21 | M | American | Iowa |
| 81 | 312 | 19 | F | Canadian | Hamilton |
| 82 | 313 | 24 | F | Irish | County Down |
| 83 | 314 | 26 | F | South African | Cape Town |
| 84 | 316 | 20 | M | Canadian | Alberta |
| 85 | 317 | 23 | F | Canadian | Hamilton |
| 86 | 318 | 32 | F | American | Napa |
| 87 | 323 | 19 | F | South African | Pretoria |
| 88 | 326 | 26 | M | Australian | Sydney |
| 89 | 329 | 23 | F | American | |
| 90 | 330 | 26 | F | American | |
| 91 | 333 | 19 | F | American | Indiana |
| 92 | 334 | 18 | M | American | Chicago |
| 93 | 335 | 25 | F | New Zealand | English |
| 94 | 336 | 18 | F | South African | Johannesburg |
| 95 | 339 | 21 | F | American | Pennsylvania |
| 96 | 340 | 18 | F | Irish | Dublin |
| 97 | 341 | 26 | F | American | Ohio |
| 98 | 343 | 27 | F | Canadian | Alberta |
| 99 | 345 | 22 | M | American | Florida |
| 100 | 347 | 26 | M | South African | Johannesburg |
| 101 | 351 | 21 | F | Northern Irish | Derry |
| 102 | 360 | 19 | M | American | New Jersey |
| 103 | 361 | 19 | F | American | New Jersey |
| 104 | 362 | 29 | F | American | |
| 105 | 363 | 22 | M | Canadian | Toronto |
| 106 | 364 | 23 | M | Irish | Donegal |
| 107 | 374 | 28 | M | Australian | English |
|