willwade commited on
Commit
a30282f
1 Parent(s): d770799

adding brief details

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -9,8 +9,9 @@ tags:
9
  - assistive-technology
10
  - spoken
11
  datasets:
12
- - jfleg
13
- - daily_dialog
 
14
  ---
15
  # t5-small-spoken-typo
16
 
@@ -55,7 +56,10 @@ Then injecting typos from a range of places
55
 
56
  And then compressing versions of the sentences (i.e. removing spaces)- both correct and typod
57
 
58
- Next we would like to C4 200M model - or a subset of it at least
 
 
 
59
 
60
 
61
  ## Developed by:
 
9
  - assistive-technology
10
  - spoken
11
  datasets:
12
+ - jfleg
13
+ - daily_dialog
14
+ - leslyarun/c4_200m_gec_train100k_test25k
15
  ---
16
  # t5-small-spoken-typo
17
 
 
56
 
57
  And then compressing versions of the sentences (i.e. removing spaces)- both correct and typod
58
 
59
+ We have also provided the C4-200M-250K subset data and the JFLEG dataset for base grammar correction
60
+
61
+ Full script to build the [dataset is here](https://colab.research.google.com/drive/1VkKU9KKIWkWQZ-pPzdDFLeRnwFxdWUtq?usp=sharing)
62
+
63
 
64
 
65
  ## Developed by: