alaleye commited on
Commit
7107bfd
1 Parent(s): 1f83ba2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - am
5
+ - sw
6
+ - wo
7
+ tags:
8
+ - speech pretraining
9
+ - african
10
+ - fongbe
11
+ - swahili
12
+ - wolof
13
+ - wav2vec2
14
+ - ahmaric
15
+ ---
16
+
17
+ # Wav2vec2-large
18
+
19
+ ## Model description
20
+ This model is a pre-trained instance of the Wav2vec 2.0 architecture, specifically focused on processing and understanding four major African languages: Fongbe, Swahili,
21
+ Amharic, and Wolof. The model leverages unlabelled audio data in these languages to learn rich, language-specific representations before any fine-tuning on downstream tasks.
22
+
23
+ ## Training data
24
+
25
+ The model was pre-trained using a diverse set of audio recordings from [ALFFA](https://github.com/getalp/ALFFA_PUBLIC) dataset.
26
+
27
+ * Fongbe: A Gbe language, primarily spoken in Benin and parts of Nigeria and Togo.
28
+ * Swahili: A Bantu language, widely spoken across East Africa including Tanzania, Kenya, Uganda, Rwanda, and Burundi.
29
+ * Amharic: The official language of Ethiopia, belonging to the Semitic branch of the Afroasiatic language family.
30
+ * Wolof: Predominantly spoken in Senegal, The Gambia, and Mauritania.
31
+
32
+ ## Model Architecture
33
+ This model uses the large version of wav2vec 2.0 architecture developed by Facebook AI, which includes a multi-layer convolutional neural network that processes
34
+ raw audio signals to produce contextual representations. These representations are then used to predict the original audio input before any labels are provided,
35
+ following a self-supervised training methodology.
36
+
37
+
38
+ ## Usage
39
+ This model is intended for use in Automatic Speech Recognition (ASR), audio classification, and other audio-related tasks in Fongbe, Swahili, Amharic, and Wolof.
40
+ To use this model for fine-tuning on a specific task, you can load it via the Hugging Face Transformers library:
41
+
42
+ ```
43
+ from transformers import Wav2Vec2Processor, Wav2Vec2Model
44
+
45
+ processor = Wav2Vec2Processor.from_pretrained("your-username/wav2vec2-african-languages")
46
+ model = Wav2Vec2Model.from_pretrained("your-username/wav2vec2-african-languages")
47
+
48
+ ```
49
+
50
+ ## Performance
51
+ The model's performance was evaluated using a held-out validation set of audio recordings.
52
+ The effectiveness of the pre-trained representations was measured in terms of their ability to be fine-tuned to specific tasks such as ASR.
53
+ Note that detailed performance metrics will depend on the specifics of the fine-tuning process and the quality of the labeled data used.
54
+
55
+ ## Limitations
56
+ The model might exhibit variability in performance across different languages due to varying amounts of training data available for each language.
57
+ Performance may degrade with audio inputs that significantly differ from the types of recordings seen during training (e.g., telephone quality audio, noisy environments).