JairamKanna
/

pretrainedwhisper-medium-native-v2

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

JairamKanna commited on Dec 9, 2023

Commit

1b42acb

•

1 Parent(s): 147bb05

Create README.md

Files changed (1) hide show

README.md +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+datasets:
+- JairamKanna/Tamil-vulnerable-speech
+language:
+- ta
+metrics:
+- wer
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+This model is the fine-tuned version of Whisper-large-v2 model for Speech Recognition task for vulnerable individuals in Tamil.
+#### Preprocessing [optional]
+#### Training Hyperparameters
+** training_args = Seq2SeqTrainingArguments(
+    output_dir="./pretrainedwhisper-medium-native-v2",  # change to a repo name of your choice
+    per_device_train_batch_size=4,
+    gradient_accumulation_steps=1,  # increase by 2x for every 2x decrease in batch size
+    learning_rate=1e-5,
+    warmup_steps=200,
+    max_steps=2000,
+    gradient_checkpointing=True,
+    fp16=True,
+    evaluation_strategy="steps",
+    per_device_eval_batch_size=8,
+    predict_with_generate=True,
+    generation_max_length=225,
+    save_steps=500,
+    eval_steps=500,
+    logging_steps=25,
+    report_to=["tensorboard"],
+    load_best_model_at_end=True,
+    metric_for_best_model="wer",
+    greater_is_better=False,
+    push_to_hub=True,
+    optim="adamw_bnb_8bit"
+)
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+WER is the evaluation metrics used here.