AudioConFit
AI & ML interests
None defined yet.
Welcome to ConFit on Huggingface Hub
About Us
ConFit is a pioneering organisation dedicated to advancing the fields of speech and language processing, audio and sound processing, and natural language processing (NLP). Our team is committed to developing state-of-the-art technologies and tools that empower researchers and developers in the audio and language domains. We provide a rich collection of audio datasets specifically designed for various machine learning applications. These datasets are perfect for training models on tasks such as audio embedding, speech recognition, and more. Our datasets are compatible with popular frameworks and can be seamlessly integrated into your projects.
Datasets
Audio classification:
Dataset | Split Method | Classes | Task | # Clips | Average Duration | Sampling Rate |
---|---|---|---|---|---|---|
WMMS | train/test | 32 | Multi-class | 1697 | 10.42 | 16000 |
MSWC (English) | train/validation/test | 271 | Multi-class | 33726 | 0.99 | 16000 |
MSWC (Spanish) | train/validation/test | 146 | Multi-class | 11759 | 0.99 | 16000 |
MSWC (Indian) | train/validation/test | 14 | Multi-class | 739 | 0.99 | 16000 |
ESC50 | 5-fold | 50 | Multi-class | 2000 | 5.00 | 44100 |
UrbanSound8K | 10-fold | 10 | Multi-class | 8732 | 3.60 | 8000 |
AudioSet (balanced) | train/test | 527 | Multi-label | 39437 | 9.89 | 32000 |
MagnaTagATune | train/validation/test | 50 | Multi-label | 21108 | 29.12 | 16000 |
Medley-solos-DB | train/validation/test | 8 | Multi-class | 21571 | 2.97 | 44100 |
Pianos | train/validation/test | 8 | Multi-class | 668 | 4.86 | 16000 |
FSD-Kaggle-2019 (curated) | train/test | 80 | Multi-label | 9451 | 8.93 | 44100 |
GTZAN | train/validation/test | 10 | Multi-class | 930 | 30.02 | 22050 |
Nsynth (instrument) | train/validation/test | 11 | Multi-class | 305979 | 4.00 | 16000 |
Nsynth (pitch) | train/validation/test | 112 | Multi-class | 305979 | 4.00 | 16000 |
CREMA-D | train/validation/test | 6 | Multi-class | 7442 | 2.54 | 16000 |
IEMOCAP | 5-fold | 4 | Multi-class | 5531 | 4.52 | 16000 |
EmoDB | train/test | 7 | Multi-class | 535 | 2.77 | 16000 |
EMOVO | 6-fold | 7 | Multi-class | 588 | 3.12 | 48000 |
IRMAS | train/test | 11 | Multi-label | 9579 | 7.16 | 44100 |
RAVDESS | 5-fold | 8 | Multi-class | 2880 | 3.70 | 48000 |
DCASE2018-Task3 | train/test | 2 | Binary-class | 35690 | 10.01 | 44100 |
TIMIT | train/validation/test | 630 | Multi-class | 6300 | 3.07 | 16000 |
LibriSpeech | train/test | 2484 | Multi-class | 21933 | 3.75 | 16000 |
Automated audio captioning:
Dataset | Split Method | # Clips | Average Duration | Sampling Rate |
---|---|---|---|---|
Music4All | train | 109269 | 29.99 | 48000 |
Clotho (v1.0) | train/test | 3938 | 22.43 | 44100 |
Clotho (v2.1) | train/validation/test | 8723 | 22.48 | 44100 |
AudioCaps | train/validation/test | 41113 | 8.38 | 48000 |
WavCaps (AudioSet-SL) | train | 85232 | 10.00 | 32000 |
WavCaps (SoundBible) | train | 1232 | 13.12 | 32000 |
WavCaps (BBC) | train | 31201 | 115.04 | 32000 |
Music, speech, and noise:
Dataset | Split Method | # Clips | Average Duration | Sampling Rate |
---|---|---|---|---|
MUSAN | train | 2016 | 195.16 | 16000 |
RIR-Noise | train | 61260 | 1.54 | 16000 |
ARCA23K | train | 17979 | 7.92 | 44100 |
Contact Us
If you have any questions or would like more information about our projects, please feel free to reach out to us.