File size: 3,190 Bytes
781a53d
 
30f2c37
 
 
 
 
 
 
 
 
 
781a53d
30f2c37
dcbdafe
30f2c37
 
 
 
 
 
62c2c0d
90ecdf8
62c2c0d
 
 
 
 
 
 
c9453e7
 
 
 
 
62c2c0d
 
 
 
 
 
 
 
6843c89
fbd3bc9
62c2c0d
6843c89
62c2c0d
 
 
 
c9453e7
 
 
 
 
 
 
 
 
 
62c2c0d
 
 
 
 
dcbdafe
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
license: apache-2.0
language:
- zh
metrics:
- accuracy
- cer
pipeline_tag: automatic-speech-recognition
tags:
- Paraformer
- FunASR
- ASR
---
## Introduce

[Paraformer](https://arxiv.org/abs/2206.08317) is a non-autoregressive end-to-end speech recognition model. Compared to the currently mainstream autoregressive models, non-autoregressive models can output the target text for the entire sentence in parallel, making them particularly suitable for parallel inference using GPUs. Paraformer is currently the first known non-autoregressive model that can achieve the same performance as autoregressive end-to-end models on industrial-scale data. When combined with GPU inference, it can improve inference efficiency by 10 times, thereby reducing machine costs for speech recognition cloud services by nearly 10 times.

This repo shows how to use Paraformer with `funasr_onnx` runtime, the model comes from [FunASR](https://github.com/alibaba-damo-academy/FunASR), which trained from 60000 hours Mandarin data. The performance of Paraformer obtained the first place in [SpeechIO Leadboard](https://github.com/SpeechColab/Leaderboard).

We have released a large number of industrial-level models, including speech recognition, voice activaty detection, punctuation restoration, speaker verification, speaker diarizatio and timestamp prediction(force alignment). If you are interest, please ref to [FunASR](https://github.com/alibaba-damo-academy/FunASR).


## Install funasr_onnx

```shell
pip install -U funasr_onnx
# For the users in China, you could install with the command:
# pip install -U funasr_onnx -i https://mirror.sjtu.edu.cn/pypi/web/simple
```

## Download the model

```shell
git clone https://huggingface.co/funasr/paraformer-large
```

## Inference with runtime

### Speech Recognition
#### Paraformer
 ```python
 from funasr_onnx import Paraformer

 model_dir = "./paraformer-large"
 model = Paraformer(model_dir, batch_size=1, quantize=True)

 wav_path = ['./funasr/paraformer-large/asr_example.wav']

 result = model(wav_path)
 print(result)
 ```
- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- `batch_size`: `1` (Default), the batch size duration inference
- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU

Input: wav formt file, support formats: `str, np.ndarray, List[str]`

Output: `List[str]`: recognition result




## Performance benchmark

Please ref to [benchmark](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/python/benchmark_onnx.md)

## Citations

``` bibtex
@inproceedings{gao2022paraformer,
  title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
  author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
  booktitle={INTERSPEECH},
  year={2022}
}
```