classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER
This model for Croatian SER (speech emotion recognition) is based on the facebook/wav2vec2-large-slavic-voxpopuli-v2
and was fine-tuned on the CrES 2.1 dataset (Croatian Emotional Speech corpus).
If you use this model, please cite the following paper describing the dataset:
@inproceedings{Dropuljić_Chmura_Kolak_Petrinović_2011, title={Emotional speech corpus of Croatian language}, ISSN={1845-5921}, booktitle={2011 7th International Symposium on Image and Signal Processing and Analysis (ISPA)}, author={Dropuljić, Branimir and Chmura, Miłosz Thomasz and Kolak, Antonio and Petrinović, Davor}, year={2011}, month={Sep}, pages={95–100} }
Metrics
Evaluation is performed on the dev and test portions of the CrES 2.1 dataset. The splitting was performed anew, stratified on emotion and with no leakage (i.e. no speaker is present in more than one split).
accuracy | macro F1 | split |
---|---|---|
0.6796 | 0.6461 | test |
0.7277 | 0.7232 | dev |
Confusion matrix on test:
Training hyperparameters
In fine-tuning, the following arguments were used:
arg | value |
---|---|
per_device_train_batch_size |
2 |
per_device_eval_batch_size |
2 |
gradient_accumulation_steps |
2 |
num_train_epochs |
20 |
learning_rate |
1e-4 |
- Downloads last month
- 26
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.