zhuxunyu commited on
Commit
a266e83
1 Parent(s): ed66152

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md CHANGED
@@ -1,3 +1,135 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - gsm8k
5
+ - ChilleD/SVAMP
6
+ - EleutherAI/asdiv
7
+ metrics:
8
+ - accuracy
9
  ---
10
+ # Model Card for Model ID
11
+
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
+
14
+ We use Mix Thoughts Distillation to distill mathematical reasoning ability from gpt-3.5-turbo to CodeT5+-770m-py.
15
+
16
+ ### Model Description
17
+
18
+ <!-- Provide a longer summary of what this model is. -->
19
+
20
+
21
+
22
+ - **Developed by:** Xunyu Zhu
23
+ - **Model type:** encoder-decoder
24
+ - **Language(s) (NLP):** python
25
+ - **License:** apache-2.0
26
+ - **Finetuned from model:** [Salesforce/codet5p-770m-py](https://huggingface.co/Salesforce/codet5p-770m-py)
27
+
28
+
29
+
30
+ ## Uses
31
+
32
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
33
+
34
+ ## Direct Use
35
+
36
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
37
+
38
+ This model can be easily loaded using the AutoModelForSeq2SeqLM functionality and employs the same tokenizer as original [Salesforce/codet5p-770m-py](https://huggingface.co/Salesforce/codet5p-770m-py).
39
+
40
+ When given a question, the prompt "Let’s break down the code step by step" is needed to add as the input to instruct the model to generate program in PoT.
41
+
42
+ When given a question, the prompt "Let's think step by step." is needed to add as the input to instruct the model to generate rationale in CoT.
43
+
44
+ When given a question, the prompt "System of linear equations: (Do not simplify)" is needed to add as the input to instruct the model to generate equations in EoT.
45
+
46
+ ### PoT
47
+
48
+ ```python
49
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
50
+ checkpoint = "zhuxunyu/mtd-codet5p-770m-py"
51
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
52
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
53
+ model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device)
54
+ question = "Question: Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\nLet’s break down the code step by step\n".
55
+ input = tokenizer(question, max_length=256, padding="max_length", truncation=True, return_tensors="pt").to(model.device)
56
+ with torch.no_grad():
57
+ output = model.generate(**input, max_length=256)
58
+ generation = tokenizer.decode(output, skip_special_tokens=True)
59
+ ```
60
+
61
+ ### CoT
62
+
63
+ ```python
64
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
65
+ checkpoint = "zhuxunyu/mtd-codet5p-770m-py"
66
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
67
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
68
+ model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device)
69
+ question = "Question: Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\nLet's think step by step.\n".
70
+ input = tokenizer(question, max_length=256, padding="max_length", truncation=True, return_tensors="pt").to(model.device)
71
+ with torch.no_grad():
72
+ output = model.generate(**input, max_length=256)
73
+ generation = tokenizer.decode(output, skip_special_tokens=True)
74
+ ```
75
+ ### EoT
76
+
77
+ ```python
78
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
79
+ checkpoint = "zhuxunyu/mtd-codet5p-770m-py"
80
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
81
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
82
+ model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device)
83
+ question = "Question: Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\nSystem of linear equations: (Do not simplify)\n".
84
+ input = tokenizer(question, max_length=256, padding="max_length", truncation=True, return_tensors="pt").to(model.device)
85
+ with torch.no_grad():
86
+ output = model.generate(**input, max_length=256)
87
+ generation = tokenizer.decode(output, skip_special_tokens=True)
88
+ ```
89
+
90
+
91
+ ## Training Details
92
+
93
+ ### Training Data
94
+
95
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
96
+
97
+ We prompt gpt-3.5-turbo to generate reasoning programs to solve questions in GSM8K training dataset, and each question includes 4 reasoning programs, 4 reasoning rationales, 4 reasoning equations systems. Then, questions in GSM8K training dataset and
98
+ their corresponding reasoning processes are built as a training dataset, and we use the training dataset to fine-tune the LM.
99
+
100
+
101
+
102
+ ## Evaluation
103
+
104
+ <!-- This section describes the evaluation protocols and provides the results. -->
105
+
106
+
107
+ ### Results
108
+
109
+ | Dataset | GSM8K | ASDiv | SVAMP | MultiArith |
110
+ | :-----: | :---: | :---: | :---: | :--------: |
111
+ | PoT | 50.34 | 55.2 | 51.6 | 88.33 |
112
+ | EoT | 48.21 | 52.81 | 55.7 | 70.16 |
113
+ | CoT | 25.47 | 29.67 | 23.3 | 46.5 |
114
+ | Mix_all | 50.56 | 55.34 | 52.3 | 88.83 |
115
+
116
+
117
+
118
+
119
+
120
+ ## Citation
121
+
122
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
123
+
124
+ **BibTeX:**
125
+
126
+ ```
127
+ @misc{zhu2024improving,
128
+ title={Improving Small Language Models' Mathematical Reasoning via Mix Thoughts Distillation},
129
+ author={Xunyu Zhu and Jian Li and Yong Liu and Can Ma and Weiping Wang},
130
+ year={2024},
131
+ eprint={2401.11864},
132
+ archivePrefix={arXiv},
133
+ primaryClass={cs.CL}
134
+ }
135
+ ```