stefan-insilico
commited on
Commit
•
ac88694
1
Parent(s):
f189f6c
Update README.md
Browse files
README.md
CHANGED
@@ -2,11 +2,18 @@
|
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
4 |
|
5 |
-
|
6 |
-
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
-
|
|
|
10 |
|
11 |
```python
|
12 |
# Load model and tokenizer
|
@@ -16,43 +23,172 @@ tokenizer = AutoTokenizer.from_pretrained("insilicomedicine/precious3-gpt", trus
|
|
16 |
model = AutoModel.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)
|
17 |
```
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
Step 2
|
21 |
|
22 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
-
# Select device
|
25 |
-
if torch.cuda.is_available():
|
26 |
-
device = f"cuda:0"
|
27 |
-
else:
|
28 |
-
device = "cpu"
|
29 |
-
print(device)
|
30 |
```
|
31 |
|
|
|
|
|
|
|
|
|
32 |
|
33 |
-
Step 3
|
34 |
```python
|
|
|
|
|
|
|
35 |
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
```
|
|
|
|
|
|
|
|
|
41 |
|
|
|
42 |
|
43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
-
```python
|
46 |
-
# Example input prompt
|
47 |
|
48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
```
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
-
Step 5.
|
53 |
-
```python
|
54 |
|
55 |
-
|
56 |
-
|
57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
|
|
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
4 |
|
5 |
+
## Precious3-GPT
|
6 |
+
|
7 |
+
A multi-omics multi-species language model.
|
8 |
+
|
9 |
+
- **Developer**: [Insilico Medicine](https://insilico.com/precious)
|
10 |
+
- **License**: cc-by-nc-4.0
|
11 |
+
- **Model size**: 88.3 million parameters
|
12 |
+
- **Domain**: Biomedical
|
13 |
+
- **Base architecture**: [MPT](https://huggingface.co/mosaicml/mpt-7b)
|
14 |
|
15 |
+
## Quickstart
|
16 |
+
Precious-GPT can be loaded and run as standard Causal Language Model through transformers interface like this:
|
17 |
|
18 |
```python
|
19 |
# Load model and tokenizer
|
|
|
23 |
model = AutoModel.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)
|
24 |
```
|
25 |
|
26 |
+
However for the convenience of using all the functionality of the Precious3-GPT model, we provide a handler.
|
27 |
+
|
28 |
+
### Run model using Prpecious3-GPT handler step by step
|
29 |
+
|
30 |
+
|
31 |
+
**Step 1 - download Prpecious3-GPT [handler.py](https://huggingface.co/insilicomedicine/precious3-gpt/blob/main/handler.py)**
|
32 |
+
```python
|
33 |
+
from handler import EndpointHandler
|
34 |
+
precious3gpt_handler = EndpointHandler()
|
35 |
+
```
|
36 |
|
37 |
+
**Step 2 - create input for the handler**
|
38 |
|
39 |
```python
|
40 |
+
import json
|
41 |
+
with open('./generation-configs/meta2diff.json', 'r') as f:
|
42 |
+
config_data = json.load(f)
|
43 |
+
|
44 |
+
# prepare request configuration
|
45 |
+
request_config = {"inputs": config_data, "mode": "meta2diff", "parameters": {
|
46 |
+
"temperature": 0.8,
|
47 |
+
"top_p": 0.2,
|
48 |
+
"top_k": 3550,
|
49 |
+
"n_next_tokens": 50,
|
50 |
+
"random_seed": 137
|
51 |
+
}}
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
```
|
54 |
|
55 |
+
**How Precisou3-GPT will see given request**
|
56 |
+
```text
|
57 |
+
[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><age_individ></age_individ><cell></cell><efo>EFO_0000768 </efo><datatype>expression </datatype><drug>curcumin </drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type></dataset_type><gender>m </gender><species>human </species>
|
58 |
+
```
|
59 |
|
60 |
+
**Step 3 - run Precisou3-GPT**
|
61 |
```python
|
62 |
+
output = precious3gpt_handler(request_config)
|
63 |
+
```
|
64 |
+
|
65 |
|
66 |
+
**Handler output structure**
|
67 |
+
```json
|
68 |
+
{
|
69 |
+
"output": {
|
70 |
+
"up": List,
|
71 |
+
"down": List
|
72 |
+
},
|
73 |
+
"mode": String, // Generation mode was selected
|
74 |
+
"message": "Done!", // or Error
|
75 |
+
"input": String // Input prompt was passed
|
76 |
+
|
77 |
+
}
|
78 |
```
|
79 |
+
Note: If the ```mode``` was supposed to generate compounds, the output would contain ```compounds: List```.
|
80 |
+
|
81 |
+
---
|
82 |
+
## Precious3-GPT request configuration
|
83 |
|
84 |
+
### Generation Modes (`mode` in config)
|
85 |
|
86 |
+
Choose the appropriate mode based on your requirements:
|
87 |
+
|
88 |
+
1. **meta2diff**: Generate signature (up- and down- gene lists) given meta-data such as tissue, compound, gender, etc.
|
89 |
+
2. **diff2compound**: Predict compounds based on signature.
|
90 |
+
3. **meta2diff2compound**: Generate signatures given meta-data and then predict compounds based on generated signatures.
|
91 |
+
|
92 |
+
---
|
93 |
|
|
|
|
|
94 |
|
95 |
+
### Instruction (`inputs.instruction` in config)
|
96 |
+
|
97 |
+
1. disease2diff2disease - generate signature for disease / predict disease based on given signature
|
98 |
+
2. compound2diff2compound - generate signature for compound / predict compound based on given signature
|
99 |
+
3. age_group2diff2age_group - generate signature for age group / predict age group based on signature
|
100 |
+
|
101 |
+
|
102 |
+
### Other meta-data (`inputs.` in config)
|
103 |
+
|
104 |
+
1. Age (```age```) for human - in years, for macaque and mouse - in days
|
105 |
+
2.
|
106 |
+
Full list of available values for each meta-data item you can find in ```p3_entities_with_type.csv```
|
107 |
+
|
108 |
+
|
109 |
+
|
110 |
+
## Examples
|
111 |
+
|
112 |
+
In the following examples all possible configuration fields are specified. You can leave some meta-data fields in the ```inputs``` section empty string(```""```) or empty list(```[]```).
|
113 |
+
|
114 |
+
_**Example 1**_
|
115 |
+
|
116 |
+
If you want to generate a signature given specific meta-data you can use the following configuration. Note, ```up``` and ```down``` fields are empty lists as you want to generate them.
|
117 |
+
Here we ask the model to generate a signature for a human within the age group of 70-90 years, male, in tissue - Lungs with disease EFO_0000768.
|
118 |
+
|
119 |
+
```json
|
120 |
+
{
|
121 |
+
"inputs": {
|
122 |
+
"instruction": ["age_group2diff2age_group", "disease2diff2disease", "compound2diff2compound"],
|
123 |
+
"tissue": ["lung"],
|
124 |
+
"age": "",
|
125 |
+
"cell": "",
|
126 |
+
"efo": "EFO_0000768",
|
127 |
+
"datatype": "", "drug": "", "dose": "", "time": "", "case": ["70.0-80.0", "80.0-90.0"], "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], "down": []
|
128 |
+
},
|
129 |
+
"mode": "meta2diff",
|
130 |
+
"parameters": {
|
131 |
+
"temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137
|
132 |
+
}
|
133 |
+
}
|
134 |
```
|
135 |
|
136 |
+
Here is output:
|
137 |
+
```json
|
138 |
+
{
|
139 |
+
"output": {
|
140 |
+
"up": [["PTGDR2", "CABYR", "MGAM", "TMED9", "SHOX2", "MAT1A", "MUC5AC", "GASK1B", "CYP1A2", "RP11-266K4.9", ...]], // generated list of up-regulated genes
|
141 |
+
"down": [["MB", "OR10V1", "OR51H1", "GOLGA6L10", "OR6M1", "CDX4", "OR4C45", "SPRR2A", "SPDYE9", "GBX2", "ATP4B", ...]] // generated list of down-regulated genes
|
142 |
+
},
|
143 |
+
"mode": "meta2diff", // generation mode we specified
|
144 |
+
"message": "Done!",
|
145 |
+
"input": "[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><cell></cell><efo>EFO_0000768 </efo><datatype></datatype><drug></drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>", // actual input prompt for the model
|
146 |
+
"random_seed": 137
|
147 |
+
}
|
148 |
+
```
|
149 |
|
|
|
|
|
150 |
|
151 |
+
_**Example 2**_
|
152 |
+
|
153 |
+
Now let's generate a signature for a healthy human within the age group of 70-90 years, male, in tissue - whole blood.
|
154 |
+
Note, here we use ```disease2diff2disease``` instruction, but we expect to generate signatures for a healthy human, that's why we'd set ```efo``` to empty string "".
|
155 |
+
Alternatively, for this example we can add one more instruction to example 2 - "instruction": ["disease2diff2disease", "age_group2diff2age_group"]
|
156 |
+
|
157 |
+
```json
|
158 |
+
{
|
159 |
+
"inputs": {
|
160 |
+
"instruction": ["disease2diff2disease", "age_group2diff2age_group"],
|
161 |
+
"tissue": ["whole blood"],
|
162 |
+
"age": "",
|
163 |
+
"cell": "",
|
164 |
+
"efo": "",
|
165 |
+
"datatype": "", "drug": "", "dose": "", "time": "", "case": "40.0-50.0", "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [],
|
166 |
+
"down": []
|
167 |
+
},
|
168 |
+
"mode": "meta2diff",
|
169 |
+
"parameters": {
|
170 |
+
"temperature": 0.8,
|
171 |
+
"top_p": 0.2,
|
172 |
+
"top_k": 3550,
|
173 |
+
"n_next_tokens": 50,
|
174 |
+
"random_seed": 137
|
175 |
+
}
|
176 |
+
}
|
177 |
+
|
178 |
+
```
|
179 |
+
|
180 |
+
Here is output:
|
181 |
+
```json
|
182 |
+
{
|
183 |
+
"output": {
|
184 |
+
"up": [["IER3", "APOC2", "EDNRB", "JAKMIP2", "BACE2", ... ]],
|
185 |
+
"down": [["TBL1Y", "TDP1", "PLPP4", "CPEB1", "ITPR3", ... ]]
|
186 |
+
},
|
187 |
+
"mode": "meta2diff",
|
188 |
+
"message": "Done!",
|
189 |
+
"input": "[BOS]<disease2diff2disease><age_group2diff2age_group><tissue>whole blood </tissue><cell></cell><efo></efo><datatype></datatype><drug></drug><dose></dose><time></time><case>40.0-50.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>",
|
190 |
+
"random_seed": 137
|
191 |
+
}
|
192 |
+
```
|
193 |
+
|
194 |
|