Update example code to use latest ReazonSpeech API

3369340 9 months ago

1.43 kB

	---
	license: apache-2.0
	language:
	- ja
	library_name: nemo
	tags:
	- automatic-speech-recognition
	- NeMo
	---

	# reazonspeech-nemo-v2

	`reazonspeech-nemo-v2` is an automatic speech recognition model trained
	on [ReazonSpeech v2.0 corpus](https://huggingface.co/datasets/reazon-research/reazonspeech).

	This model supports inference of long-form Japanese audio clips up to
	several hours.

	## Model Architecture

	The model features an improved Conformer architecture from
	[Fast Conformer with Linearly Scalable Attention for Efficient
	Speech Recognition](https://arxiv.org/abs/2305.05084).

	* Subword-based RNN-T model. The total parameter count is 619M.

	* Encoder uses [Longformer](https://arxiv.org/abs/2004.05150) attention
	with local context size of 256, and has a single global token.

	* Decoder has a vocabulary space of 3000 tokens constructed by
	[SentencePiece](https://github.com/google/sentencepiece)
	unigram tokenizer.

	We trained this model for 1 million steps using AdamW optimizer
	following Noam annealing schedule.

	## Usage

	We recommend to use this model through our
	[reazonspeech](https://github.com/reazon-research/reazonspeech)
	library.

	```
	from reazonspeech.nemo.asr import load_model, transcribe, audio_from_path

	audio = audio_from_path("speech.wav")
	model = load_model()
	ret = transcribe(model, audio)
	print(ret.text)
	```

	## License

	[Apaceh Licence 2.0](https://choosealicense.com/licenses/apache-2.0/)