jq commited on
Commit
e25860a
1 Parent(s): 75df3cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -49
README.md CHANGED
@@ -8,68 +8,70 @@ metrics:
8
  model-index:
9
  - name: mms-lug
10
  results: []
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # Sunbird - MMS Finetuned Models
 
17
 
18
- This model is a fine-tuned version of [facebook/mms-1b-all](https://huggingface.co/facebook/mms-1b-all) on the None dataset.
 
 
 
 
 
 
19
 
20
- ## Model description
 
21
 
22
- More information needed
23
 
24
- ## Intended uses & limitations
25
 
26
- More information needed
 
 
 
27
 
28
- ## Training and evaluation data
 
 
 
29
 
30
- More information needed
 
 
 
31
 
32
- ## Training procedure
 
 
33
 
34
- ### Training hyperparameters
 
 
 
35
 
36
- To Add
 
37
 
38
- ### Results
 
39
 
40
- | Language Adapter | WER (%) | CER (%) | Additional Details |
41
- |---------------------|--------:|--------:|---------------------|
42
- | **Luganda (Lug)** | | | |
43
- | Lug-Base | 0.25 | | |
44
- | Lug+5Gram LM | | | |
45
- | Lug+3Gram LM | | | |
46
- | Lug+English Combined| 0.12 | | |
47
- | **Acholi (Ach)** | | | |
48
- | Ach-Base | 0.34 | | |
49
- | Ach+3Gram LM | | | |
50
- | Ach+5Gram LM | | | |
51
- | Ach+English Combined| 0.18 | | |
52
- | **Lugbara (Lgg)** | | | |
53
- | Lgg-Base | | | |
54
- | Lgg+3Gram LM | | | |
55
- | Lgg+5Gram LM | | | |
56
- | Lgg+English Combined| 0.25 | | |
57
- | **Teso (Teo)** | | | |
58
- | Teo-Base | 0.39 | | |
59
- | Teo+3Gram LM | | | |
60
- | Teo+5Gram LM | | | |
61
- | Teo+English Combined| 0.29 | | |
62
- | **Nyankore (Nyn)** | | | |
63
- | Nyn-Base | 0.48 | | |
64
- | Nyn+3Gram LM | | | |
65
- | Nyn+5Gram LM | | | |
66
- | Nyn+English Combined| 0.29 | | |
67
 
68
- _Note: LM stands for Language Model. The `+3Gram LM` and `+5Gram LM` suffixes indicate models enhanced with trigram and five-gram language models, respectively._
69
-
70
- ### Framework versions
71
-
72
- - Transformers 4.32.0.dev0
73
- - Pytorch 2.0.1+cu117
74
- - Datasets 2.13.0
75
- - Tokenizers 0.13.3
 
8
  model-index:
9
  - name: mms-lug
10
  results: []
11
+ datasets:
12
+ - Sunbird/salt
13
+ language:
14
+ - lg
15
+ - en
16
+ - ach
17
+ - teo
18
+ - lgg
19
+ - nyn
20
  ---
21
 
22
+ # MMS speech recognition for Ugandan languages
 
23
 
24
+ This is a fine-tuned version of [facebook/mms-1b-all](https://huggingface.co/facebook/mms-1b-all)
25
+ for Ugandan languages, trained with the [SALT](https://huggingface.co/datasets/Sunbird/salt) dataset. The languages supported are:
26
 
27
+ | code | language |
28
+ | --- | --- |
29
+ | lug | Luganda |
30
+ | ach | Acholi |
31
+ | lgg | Lugbara |
32
+ | teo | Ateso |
33
+ | nyn | Runyankole |
34
 
35
+ For each language there are two adapters: one optimised for cases where the speech is only in that language,
36
+ and another in which code-switching with English is expected.
37
 
38
+ # Usage
39
 
40
+ Usage is the same as the base model, though with different adapters available.
41
 
42
+ ```python
43
+ import torch
44
+ import transformers
45
+ import datasets
46
 
47
+ # Available adapters:
48
+ # ['lug', 'lug+eng', 'ach', 'ach+eng', 'lgg', 'lgg+eng',
49
+ # 'nyn', 'nyn+eng', 'teo', 'teo+eng']
50
+ language = 'lug'
51
 
52
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
53
+ model = transformers.Wav2Vec2ForCTC.from_pretrained(
54
+ 'Sunbird/asr-mms-salt').to(device)
55
+ model.load_adapter(language)
56
 
57
+ processor = transformers.Wav2Vec2Processor.from_pretrained(
58
+ 'Sunbird/asr-mms-salt')
59
+ processor.tokenizer.set_target_lang(language)
60
 
61
+ # Get some test audio
62
+ ds = datasets.load_dataset('Sunbird/salt', 'multispeaker-lug', split='test')
63
+ audio = ds[0]['audio']
64
+ sample_rate = ds[0]['sample_rate']
65
 
66
+ # Apply the model
67
+ inputs = processor(audio, sampling_rate=sample_rate, return_tensors="pt")
68
 
69
+ with torch.no_grad():
70
+ outputs = model(**inputs.to(device)).logits
71
 
72
+ ids = torch.argmax(outputs, dim=-1)[0]
73
+ transcription = processor.decode(ids)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
+ print(transcription)
76
+ # ekikola ky'akasooli kyakyenvu wabula langi yakyo etera okuba eyaakitaka wansi
77
+ ```