vrdn23 commited on
Commit
bd61780
1 Parent(s): 59c5059

Update README.md

Browse files

Initial commit for README

Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -8,3 +8,34 @@ tags:
8
  - Grapheme-to-Phoneme
9
  ---
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - Grapheme-to-Phoneme
9
  ---
10
 
11
+ ## Model Summary
12
+
13
+ mini-bart-g2p is a seq2seq model based on the [BART architecture](https://arxiv.org/abs/1910.13461). We spruce down the number of layers and transformer heads in the original BART architecture to ensure that we can reliably train the model for the grapheme to phoneme conversion task.
14
+
15
+
16
+ ## Intended Uses
17
+
18
+ The input is expected to contain English words consisting of Latin letters and apostrophe. The model has been trained to take as input a single word at a time and gives unexpected results when fed multiple words separated by spaces. The [Huggingface tokenizer config](https://huggingface.co/cisco-ai/mini-bart-g2p/blob/main/tokenizer.json) ensures to normalize the words into lowercase and separates the letters by space under the hood, so while using the model you may provide the words normally without separation between letters.
19
+
20
+ The model provides output in the form of phonemes along with their corresponding stress numbers.
21
+
22
+
23
+ ## How to Use
24
+
25
+
26
+
27
+
28
+
29
+
30
+ ## Training
31
+ The mini-bart-g2p model was trained on a combination of both the [Librispeech Alignments dataset](https://zenodo.org/records/2619474#.YuCdaC8r1ZF) and the [CMUDict dataset](https://github.com/cmusphinx/cmudict).
32
+
33
+
34
+ ## Limitations
35
+
36
+
37
+
38
+
39
+ ### License
40
+
41
+ The model is licensed under the [Apache 2.0 License](https://huggingface.co/cisco-ai/mini-bart-g2p/blob/main/LICENSE).