File size: 6,352 Bytes
f8adc06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
 
 
 
4a5ca58
f8adc06
 
 
 
 
 
15e4a2d
f8adc06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15e4a2d
 
 
 
 
 
 
 
 
f8adc06
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
license: mit
language:
- en
tags:
- t5
model-index:
- name: metro_t0pp_largepp
  results:
  - task:
      type: natural-language-inference
    dataset:
      type: super_glue
      name: RTE
      config: rte
      split: validation
    metrics:
      - type: accuracy
        value: 83.68231046931406
  - task:
      type: natural-language-inference
    dataset:
      type: super_glue
      name: CB
      config: cb
      split: validation
    metrics:
      - type: accuracy
        value: 74.8809523809524
  - task:
      type: natural-language-inference
    dataset:
      type: anli
      name: ANLI R1
      split: dev_r1
    metrics:
      - type: accuracy
        value: 46.84
  - task:
      type: natural-language-inference
    dataset:
      type: anli
      name: ANLI R2
      split: dev_r2
    metrics:
      - type: accuracy
        value: 40.373333333333335
  - task:
      type: natural-language-inference
    dataset:
      type: anli
      name: ANLI R3
      split: dev_r3
    metrics:
      - type: accuracy
        value: 44.949999999999996
  - task:
      type: coreference-resolution
    dataset:
      type: super_glue
      name: WSC
      config: wsc.fixed
      split: validation
    metrics:
      - type: accuracy
        value: 71.82692307692307
  - task:
      type: coreference-resolution
    dataset:
      type: winogrande
      name: Winogrande XL
      config: winogrande_xl
      split: validation
    metrics:
      - type: accuracy
        value: 62.74664561957379
  - task:
      type: multiple-choice-qa
    dataset:
      type: super_glue
      name: COPA
      config: copa
      split: validation
    metrics:
      - type: accuracy
        value: 92.625
  - task:
      type: multiple-choice-qa
    dataset:
      type: story_cloze
      name: StoryCloze 2016
      config: '2016'
      split: validation
    metrics:
      - type: accuracy
        value: 95.64938535542491
  - task:
      type: multiple-choice-qa
    dataset:
      type: hellaswag
      name: HellaSwag
      split: validation
    metrics:
      - type: accuracy
        value: 83.74327823142801
  - task:
      type: word-sense-disambiguation
    dataset:
      type: super_glue
      name: WiC
      config: wic
      split: validation
    metrics:
      - type: accuracy
        value: 70.4858934169279
---

Official repository: https://github.com/gonglinyuan/metro_t0

# METRO-T0

Paper: [Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers](https://arxiv.org/abs/2305.12567) (ACL 2023)

METRO-T0 is a T5-style text-to-text Transformer pretrained using model-generated pretraining signals, prompt-finetuned on a family of public NLP tasks proposed in [T0](https://arxiv.org/abs/2110.08207).
METRO-T0 is highly parameter efficient. For example, METRO-T0-Large++ (775M parameters) outperforms GPT-3 (175B parameters) and T0-3B (3B parameters) on a wide range of NLP tasks.

![The architecture of METRO-T0 during pretraining using BERT as the auxiliary model to generate signals](https://github.com/gonglinyuan/metro_t0/raw/main/assets/metro_t0_method.png)

![Prompt learning results of METRO-T0 versus our T0 baseline and T03B by Sanh et al. (2022) on 4 tasks  in the T0 Eval benchmark. Each point denotes the accuracy using one prompt template, except that the median accuracy over all templates of T03B is indicated by the blue point. The plots of other tasks are in our paper.](https://github.com/gonglinyuan/metro_t0/raw/main/assets/metro_t0_selected_results.png)

## Use METRO-T0++-Large++

To use METRO-T0++-Large++ in PyTorch (Python 3.7+, PyTorch 1.12+ and transformers 4.17+ are prerequisites), refer to the code snippet below:

```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("gonglinyuan/metro_t0pp_largepp", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("gonglinyuan/metro_t0pp_largepp", trust_remote_code=True)

input_text = "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy"
inputs = tokenizer([input_text], max_length=512, truncation=True, add_special_tokens=True, return_tensors="pt").input_ids
outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))  # expected: positive
```

## Other METRO-T0 Models

|                    | # Parameters | Pretraining Data | Prompt-Finetuning Data |
|--------------------|--------------|------------------|------------------------|
| [METRO-T0-Base](https://huggingface.co/gonglinyuan/metro_t0_base)      | 226M         | Wikibook (16G)   | T0 Train               |
| [METRO-T0+-Base](https://huggingface.co/gonglinyuan/metro_t0p_base)     | 226M         | Wikibook (16G)   | T0+ Train              |
| [METRO-T0++-Base](https://huggingface.co/gonglinyuan/metro_t0pp_base)    | 226M         | Wikibook (16G)   | T0++ Train             |
| [METRO-T0-Base++](https://huggingface.co/gonglinyuan/metro_t0_basepp)    | 256M         | 160G corpus      | T0 Train               |
| [METRO-T0+-Base++](https://huggingface.co/gonglinyuan/metro_t0p_basepp)   | 256M         | 160G corpus      | T0+ Train              |
| [METRO-T0++-Base++](https://huggingface.co/gonglinyuan/metro_t0pp_basepp)  | 256M         | 160G corpus      | T0++ Train             |
| [METRO-T0-Large++](https://huggingface.co/gonglinyuan/metro_t0_largepp)   | 775M         | 160G corpus      | T0 Train               |
| [METRO-T0+-Large++](https://huggingface.co/gonglinyuan/metro_t0p_largepp)  | 775M         | 160G corpus      | T0+ Train              |
| [METRO-T0++-Large++](https://huggingface.co/gonglinyuan/metro_t0pp_largepp) | 775M         | 160G corpus      | T0++ Train             |


## Citation

If you find the code and models useful for your research, please cite the following paper:

```
@misc{gong2023modelgenerated,
      title={Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers}, 
      author={Linyuan Gong and Chenyan Xiong and Xiaodong Liu and Payal Bajaj and Yiqing Xie and Alvin Cheung and Jianfeng Gao and Xia Song},
      year={2023},
      eprint={2305.12567},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2305.12567}
}
```