JacobLinCool
commited on
Commit
•
40e50ce
1
Parent(s):
d2aa6a6
Update README.md
Browse files
README.md
CHANGED
@@ -46,18 +46,22 @@ It achieves the following results on the evaluation set:
|
|
46 |
|
47 |
## Model description
|
48 |
|
49 |
-
|
50 |
|
51 |
## Intended uses & limitations
|
52 |
|
53 |
-
|
|
|
|
|
54 |
|
55 |
## Training and evaluation data
|
56 |
|
57 |
-
|
58 |
|
59 |
## Training procedure
|
60 |
|
|
|
|
|
61 |
### Training hyperparameters
|
62 |
|
63 |
The following hyperparameters were used during training:
|
|
|
46 |
|
47 |
## Model description
|
48 |
|
49 |
+
This is an open-source Traditional Chinese (Taiwan) automatic speech recognition (ASR) model.
|
50 |
|
51 |
## Intended uses & limitations
|
52 |
|
53 |
+
This model is designed to be a prompt-free ASR model for Traditional Chinese. Due to its inherited language identification (LID) system from Whisper, which supports other Chinese language variants under the same language token (`zh`), we expect that performance may degrade when transcribing Simplified Chinese.
|
54 |
+
|
55 |
+
The model is free to use under the MIT license.
|
56 |
|
57 |
## Training and evaluation data
|
58 |
|
59 |
+
This model was trained on the [Common Voice Corpus 19.0 Chinese (Taiwan) Subset](https://huggingface.co/datasets/JacobLinCool/common_voice_19_0_zh-TW), containing about 50k training examples (44 hours) and 5k test examples (5 hours). This dataset is four times larger than the combination of training and validation set (`train+validation`) of [mozilla-foundation/common_voice_16_1](https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1), which includes about 12k examples.
|
60 |
|
61 |
## Training procedure
|
62 |
|
63 |
+
[Tensorboard](https://huggingface.co/JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW-lora/tensorboard)
|
64 |
+
|
65 |
### Training hyperparameters
|
66 |
|
67 |
The following hyperparameters were used during training:
|