Spaces:
Build error
Build error
ARTICLE =""" | |
**Motivation** | |
In Africa, like in other continents of the world, people access vital information mainly through their mobile phones. Therefore the need for voice-enabled applications can be found in all sectors, from health, food to more fun (games, social media). | |
Existing speech recognition services are not available in many African languages, and the speakers of these languages are excluded from the benefits of voice-enabled technologies. | |
This dataset will boost speech technologies (like speech-to-text, text-to-speech, speech translation, and modeling) for African languages, which hitherto had little or no public dataset. | |
**Note:** This is a continuous effort. This sprint is just to kick-start the event. Please feel free to share with your family and friends and keep recording more. | |
**Benefits of such a dataset** | |
- Useful dataset to learn audio-related Machine Learning (automatics speech recognition, text-to-speech, other types of speech processing). | |
- It can be used as a simple training and/or evaluation dataset for speech processing tasks. | |
- Very easy dataset to train your model on and get good results. With this dataset, you can easily train a model to recognize numbers in your language. | |
- Opens up opportunities for more sophisticated speech processing models for African languages. | |
**What about License and security?** | |
- The safety and interest of the recorders come first. Based on that, we are exploring options like a gated dataset ([this is an example of a gated dataset](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0)) to ensure anonymity and safety, as well as better license for the dataset. | |
- If you have ideas of better privacy enhancement processes, or more licensing that is more beneficial to the contributors, please reach out to me. My contact details are below. | |
**About the dataset** | |
- The data (metadata, text, and audio recording) are uploaded to [a public Hugging Face dataset](https://huggingface.co/datasets/chrisjay/crowd-speech-africa). For code lovers, [this](https://huggingface.co/spaces/chrisjay/afro-speech/blob/main/app.py#L90-L106) is the part of our code that handles the upload. | |
- We do not collect your name, address or other sensitive information. | |
- If for some reason you want to remove your entry, please reach out by email. | |
- Your email, if given, is used only to keep track of your progress in order to give the prizes to the top scorers. They are temporarily stored in [this private dataset](https://huggingface.co/datasets/chrisjay/african-digits-recording-sprint-email) and immediately deleted after the sprint. | |
**Contact** | |
In case of questions, issues or anything contact Chris Emezue at: | |
- Email: chris.emezue@gmail.com | |
- [Telegram](https://t.me/realchrisjay) | |
""" |