File size: 987 Bytes
a241c61
9c2cf62
 
a241c61
9c2cf62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
base_model:
- google-t5/t5-small
---
1. Download the repo

```python
import os
import torch

from glob import glob
from transformers import AutoModelForSeq2SeqLM, AutoConfig

model_name = 'marsggbo/t5-small_dff2048_dmodel32_token-pattern-predictor_mixtral8x7bInstructv0.1_xsum'
# ignore the mismatched size, because lm_head was modified
model = AutoModelForSeq2SeqLM.from_pretrained(
  model_name, ignore_mismatched_sizes=True, use_safetensors=False
)
```

3. Build the model
```python
home_path = os.path.expanduser('~')
num_classes = 32*8 # 32 layers, each with 8 experts
ckpt_path = f"{home_path}/.cache/huggingface/hub/*{model_name.split('/')[-1]}/snapshots/*/*bin"
ckpt_path = glob(ckpt_path)[0]

model_config = AutoConfig.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_config(config=model_config)
model.lm_head = torch.nn.Linear(model.config.hidden_size, num_classes, bias=False)
model.load_state_dict(torch.load(ckpt_path, map_location='cpu'), strict=True)
```