File size: 1,434 Bytes
a9099fc f889979 a9099fc f889979 3369340 f889979 3369340 f889979 3369340 f889979 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
license: apache-2.0
language:
- ja
library_name: nemo
tags:
- automatic-speech-recognition
- NeMo
---
# reazonspeech-nemo-v2
`reazonspeech-nemo-v2` is an automatic speech recognition model trained
on [ReazonSpeech v2.0 corpus](https://huggingface.co/datasets/reazon-research/reazonspeech).
This model supports inference of long-form Japanese audio clips up to
several hours.
## Model Architecture
The model features an improved Conformer architecture from
[Fast Conformer with Linearly Scalable Attention for Efficient
Speech Recognition](https://arxiv.org/abs/2305.05084).
* Subword-based RNN-T model. The total parameter count is 619M.
* Encoder uses [Longformer](https://arxiv.org/abs/2004.05150) attention
with local context size of 256, and has a single global token.
* Decoder has a vocabulary space of 3000 tokens constructed by
[SentencePiece](https://github.com/google/sentencepiece)
unigram tokenizer.
We trained this model for 1 million steps using AdamW optimizer
following Noam annealing schedule.
## Usage
We recommend to use this model through our
[reazonspeech](https://github.com/reazon-research/reazonspeech)
library.
```
from reazonspeech.nemo.asr import load_model, transcribe, audio_from_path
audio = audio_from_path("speech.wav")
model = load_model()
ret = transcribe(model, audio)
print(ret.text)
```
## License
[Apaceh Licence 2.0](https://choosealicense.com/licenses/apache-2.0/)
|