afro-speech / article.py
chrisjay's picture
changed email address
e7002f9
ARTICLE ="""
**Motivation**
In Africa, like in other continents of the world, people access vital information mainly through their mobile phones. Therefore the need for voice-enabled applications can be found in all sectors, from health, food to more fun (games, social media).
Existing speech recognition services are not available in many African languages, and the speakers of these languages are excluded from the benefits of voice-enabled technologies.
This dataset will boost speech technologies (like speech-to-text, text-to-speech, speech translation, and modeling) for African languages, which hitherto had little or no public dataset.
**Note:** This is a continuous effort. This sprint is just to kick-start the event. Please feel free to share with your family and friends and keep recording more.
**Benefits of such a dataset**
- Useful dataset to learn audio-related Machine Learning (automatics speech recognition, text-to-speech, other types of speech processing).
- It can be used as a simple training and/or evaluation dataset for speech processing tasks.
- Very easy dataset to train your model on and get good results. With this dataset, you can easily train a model to recognize numbers in your language.
- Opens up opportunities for more sophisticated speech processing models for African languages.
**What about License and security?**
- The safety and interest of the recorders come first. Based on that, we are exploring options like a gated dataset ([this is an example of a gated dataset](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0)) to ensure anonymity and safety, as well as better license for the dataset.
- If you have ideas of better privacy enhancement processes, or more licensing that is more beneficial to the contributors, please reach out to me. My contact details are below.
**About the dataset**
- The data (metadata, text, and audio recording) are uploaded to [a public Hugging Face dataset](https://huggingface.co/datasets/chrisjay/crowd-speech-africa). For code lovers, [this](https://huggingface.co/spaces/chrisjay/afro-speech/blob/main/app.py#L90-L106) is the part of our code that handles the upload.
- We do not collect your name, address or other sensitive information.
- If for some reason you want to remove your entry, please reach out by email.
- Your email, if given, is used only to keep track of your progress in order to give the prizes to the top scorers. They are temporarily stored in [this private dataset](https://huggingface.co/datasets/chrisjay/african-digits-recording-sprint-email) and immediately deleted after the sprint.
**Contact**
In case of questions, issues or anything contact Chris Emezue at:
- Email: chris.emezue@gmail.com
- [Telegram](https://t.me/realchrisjay)
"""