JacobLinCool
/

whisper-large-v3-turbo-common_voice_19_0-zh-TW-lora

Automatic Speech Recognition

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

JacobLinCool commited on about 1 month ago

Commit

40e50ce

•

1 Parent(s): d2aa6a6

Update README.md

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -46,18 +46,22 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:

 ## Model description
+This is an open-source Traditional Chinese (Taiwan) automatic speech recognition (ASR) model.
 ## Intended uses & limitations
+This model is designed to be a prompt-free ASR model for Traditional Chinese. Due to its inherited language identification (LID) system from Whisper, which supports other Chinese language variants under the same language token (`zh`), we expect that performance may degrade when transcribing Simplified Chinese.
+The model is free to use under the MIT license.
 ## Training and evaluation data
+This model was trained on the [Common Voice Corpus 19.0 Chinese (Taiwan) Subset](https://huggingface.co/datasets/JacobLinCool/common_voice_19_0_zh-TW), containing about 50k training examples (44 hours) and 5k test examples (5 hours). This dataset is four times larger than the combination of training and validation set (`train+validation`) of [mozilla-foundation/common_voice_16_1](https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1), which includes about 12k examples.
 ## Training procedure
+[Tensorboard](https://huggingface.co/JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW-lora/tensorboard)
 ### Training hyperparameters
 The following hyperparameters were used during training: