YC-Li/Sequence-to-Sequence-ASR-Error-Correction

Pre-Training Settings:

166k samples from Common Voice 13.0 was recognized by Whisper tiny.en.

1,000 random samples was selected as the test set, and the rest for training and validation with an 80%-20% split

Batch size: 256
Initial learning rate: 1e-5
Adam optimizer
30 epochs
Cross-entropy loss
Best checkpoint saved based on WER as the evaluation metric
Decoding is performed using beam search with a size of 5
S2S backbone model adopted from ''Exploring data augmentation for code generation tasks''.

Continue-Training Setting:

2 epochs for gold-gold to prevent the over-correction problem on ''Ted talk data''