hatmanstack
Initial
95f7c25
metadata
license: apache-2.0
base_model: facebook/wav2vec2-large-xlsr-53
metrics:
  - accuracy
model-index:
  - name: audio-emotion-detection
    results: []

Audio Emotion Detection

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53.

It achieves the following results on the evaluation set:

  • Loss: 0.9555
  • Accuracy: 0.6262

Model description

A model that returns Labels for Angry, Disgusted, Fearful, Happy, Neutral, Sad, Suprised. All aduio was trained at a sampling rate of 16000 and all inputs should be transformed to work properly.

Training and evaluation data

  • mozilla-foundation/common_voice_6_0
  • speech-recognition-community-v2/dev_data

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.5875 1.0 40 1.2574 0.5133
1.1637 2.0 80 1.0852 0.5590
0.9827 3.0 120 1.0048 0.6090
0.8683 4.0 160 0.9555 0.6262