w2v-bert-cv-grain-lg_both

This model is a fine-tuned version of facebook/w2v-bert-2.0 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 100
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.4609	1.0	5406	0.1400	0.1423	0.0296
0.2829	2.0	10812	0.1133	0.0968	0.0213
0.2369	3.0	16218	0.1033	0.0883	0.0193
0.2106	4.0	21624	0.0848	0.0681	0.0162
0.197	5.0	27030	0.0871	0.0681	0.0159
0.2459	6.0	32436	0.1335	0.1022	0.0203
0.3563	7.0	37842	0.1809	0.1254	0.0267
0.6033	8.0	43248	0.5575	0.7032	0.1768
4.656	9.0	48654	16.9063	0.9980	0.9837
10.5595	10.0	54060	12.4706	1.0	1.0
17.1148	11.0	59466	16.2280	1.0	1.0
17.4223	12.0	64872	16.2273	1.0	1.0
17.4172	13.0	70278	16.2222	1.0	1.0
17.4159	14.0	75684	16.2243	1.0	1.0