roygan commited on
Commit
820e89a
1 Parent(s): e8e6b46

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -35,7 +35,7 @@ Good at handling NLT tasks, Chinese T5-small, use BertTokenizer and chinese vo
35
 
36
  对比T5-small,训练了它的中文版。为了更好适用于中文任务,我们仅使用BertTokenzier,和支持中英文的词表,并且使用了语料库自适应预训练(Corpus-Adaptive Pre-Training, CAPT)技术在悟道语料库(180G版本)继续预训练。预训练目标为破坏span。具体地,我们在预训练阶段中使用了[封神框架](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen)大概花费了8张A100约24小时。
37
 
38
- Compared with T5,-samll we implement its Chinese version. In order to use for chinese tasks, we use BertTokenizer and Chinese vocabulary, and Corpus-Adaptive Pre-Training (CAPT) on the WuDao Corpora (180 GB version). The pretraining objective is span corruption. Specifically, we use the [fengshen framework](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen) in the pre-training phase which cost about 24 hours with 8 A100 GPUs.
39
 
40
  ## 使用 Usage
41
 
 
35
 
36
  对比T5-small,训练了它的中文版。为了更好适用于中文任务,我们仅使用BertTokenzier,和支持中英文的词表,并且使用了语料库自适应预训练(Corpus-Adaptive Pre-Training, CAPT)技术在悟道语料库(180G版本)继续预训练。预训练目标为破坏span。具体地,我们在预训练阶段中使用了[封神框架](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen)大概花费了8张A100约24小时。
37
 
38
+ Compared with T5-samll, we implement its Chinese version. In order to use for chinese tasks, we use BertTokenizer and Chinese vocabulary, and Corpus-Adaptive Pre-Training (CAPT) on the WuDao Corpora (180 GB version). The pretraining objective is span corruption. Specifically, we use the [fengshen framework](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen) in the pre-training phase which cost about 24 hours with 8 A100 GPUs.
39
 
40
  ## 使用 Usage
41