tensorops commited on
Commit
5bf5697
1 Parent(s): 3584f7a

update model

Browse files
README.md CHANGED
@@ -5,7 +5,7 @@ license: mit
5
  ## Distilled Medium Whisper ASR Model for Thai
6
 
7
  ### Model Description
8
- This is a distilled Automatic Speech Recognition (ASR) model, based on the Whisper architecture. It has been specifically tailored for Thai language speech recognition. The model features 2 decoder layers (vs 24 in teacher model) and has been distilled from a larger teacher model, focusing on enhancing performance and efficiency.
9
 
10
  #### Distillation Details
11
  - **Teacher Model**: Medium Whisper ASR model
@@ -18,10 +18,10 @@ This is a distilled Automatic Speech Recognition (ASR) model, based on the Whisp
18
 
19
  ### Model Performance
20
  - **DeepCut Tokenized WER on Common Voice 13 Test Set**:
21
- - Distilled Model: **17.2%**
22
  - Teacher Model: **8.92%**
23
 
24
- Reducing the decoder layers to just 2 layers hurts WER significantly for Thai speech. Additional datasets for distillation or more decoder layers might improve the WER. More to come soon!
25
 
26
  ### Intended Use
27
  This model is intended for use in applications requiring Thai language speech recognition.
 
5
  ## Distilled Medium Whisper ASR Model for Thai
6
 
7
  ### Model Description
8
+ This is a distilled Automatic Speech Recognition (ASR) model, based on the Whisper architecture. It has been specifically tailored for Thai language speech recognition. The model features 4 decoder layers (vs 24 in teacher model) and has been distilled from a larger teacher model, focusing on enhancing performance and efficiency.
9
 
10
  #### Distillation Details
11
  - **Teacher Model**: Medium Whisper ASR model
 
18
 
19
  ### Model Performance
20
  - **DeepCut Tokenized WER on Common Voice 13 Test Set**:
21
+ - Distilled Model: **10.49%**
22
  - Teacher Model: **8.92%**
23
 
24
+ Additional datasets for distillation or more decoder layers might improve the WER. More to come soon!
25
 
26
  ### Intended Use
27
  This model is intended for use in applications requiring Thai language speech recognition.
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "../../distil-whisper-medium-aug-th-init",
3
  "activation_dropout": 0.0,
4
  "activation_function": "gelu",
5
  "apply_spec_augment": true,
@@ -17,7 +17,7 @@
17
  "decoder_attention_heads": 16,
18
  "decoder_ffn_dim": 4096,
19
  "decoder_layerdrop": 0.0,
20
- "decoder_layers": 2,
21
  "decoder_start_token_id": 50258,
22
  "dropout": 0.0,
23
  "encoder_attention_heads": 16,
 
1
  {
2
+ "_name_or_path": "./distil-whisper-medium-init",
3
  "activation_dropout": 0.0,
4
  "activation_function": "gelu",
5
  "apply_spec_augment": true,
 
17
  "decoder_attention_heads": 16,
18
  "decoder_ffn_dim": 4096,
19
  "decoder_layerdrop": 0.0,
20
+ "decoder_layers": 4,
21
  "decoder_start_token_id": 50258,
22
  "dropout": 0.0,
23
  "encoder_attention_heads": 16,
distil-whisper/events.out.tfevents.1705422649.c175d7640af5.1725615.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59683a04b97cfdaebbab875018dc06dd25b96e82190613b672dee6676d497fce
3
+ size 6288
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fa97226df5323c516124e99226d2519e2d369c2b0dcabf78233e19e9e97d7409
3
- size 1577553712
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:280d23deb47a68a4173948b4759cc2d02884df78ef198594560c95a7e48fac10
3
+ size 1711916448