license: cc-by-nc-4.0
Precious3-GPT
A multi-omics multi-species language model.
- Developer: Insilico Medicine
- License: cc-by-nc-4.0
- Model size: 88.3 million parameters
- Domain: Biomedical
- Base architecture: MPT
Quickstart
Precious-GPT can be loaded and run as standard Causal Language Model through transformers interface like this:
# Load model and tokenizer
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)
model = AutoModel.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)
However for the convenience of using all the functionality of the Precious3-GPT model, we provide a handler.
Run model using Prpecious3-GPT handler step by step
Step 1 - download Prpecious3-GPT handler.py
from handler import EndpointHandler
precious3gpt_handler = EndpointHandler()
Step 2 - create input for the handler
import json
with open('./generation-configs/meta2diff.json', 'r') as f:
config_data = json.load(f)
# prepare request configuration
request_config = {"inputs": config_data, "mode": "meta2diff", "parameters": {
"temperature": 0.8,
"top_p": 0.2,
"top_k": 3550,
"n_next_tokens": 50,
"random_seed": 137
}}
How Precisou3-GPT will see given request
[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><age_individ></age_individ><cell></cell><efo>EFO_0000768 </efo><datatype>expression </datatype><drug>curcumin </drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type></dataset_type><gender>m </gender><species>human </species>
Step 3 - run Precisou3-GPT
output = precious3gpt_handler(request_config)
Handler output structure
{
"output": {
"up": List,
"down": List
},
"mode": String, // Generation mode was selected
"message": "Done!", // or Error
"input": String // Input prompt was passed
}
Note: If the mode
was supposed to generate compounds, the output would contain compounds: List
.
Precious3-GPT request configuration
Generation Modes (mode
in config)
Choose the appropriate mode based on your requirements:
- meta2diff: Generate signature (up- and down- gene lists) given meta-data such as tissue, compound, gender, etc.
- diff2compound: Predict compounds based on signature.
- meta2diff2compound: Generate signatures given meta-data and then predict compounds based on generated signatures.
Instruction (inputs.instruction
in config)
- disease2diff2disease - generate signature for disease / predict disease based on given signature
- compound2diff2compound - generate signature for compound / predict compound based on given signature
- age_group2diff2age_group - generate signature for age group / predict age group based on signature
Other meta-data (inputs.
in config)
Full list of available values for each meta-data item you can find in p3_entities_with_type.csv
Examples
In the following examples all possible configuration fields are specified. You can leave some meta-data fields in the inputs
section empty string(""
) or empty list([]
).
Example 1
If you want to generate a signature given specific meta-data you can use the following configuration. Note, up
and down
fields are empty lists as you want to generate them.
Here we ask the model to generate a signature for a human within the age group of 70-90 years, male, in tissue - Lungs with disease EFO_0000768.
{
"inputs": {
"instruction": ["age_group2diff2age_group", "disease2diff2disease", "compound2diff2compound"],
"tissue": ["lung"],
"age": "",
"cell": "",
"efo": "EFO_0000768",
"datatype": "", "drug": "", "dose": "", "time": "", "case": ["70.0-80.0", "80.0-90.0"], "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], "down": []
},
"mode": "meta2diff",
"parameters": {
"temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137
}
}
Here is output:
{
"output": {
"up": [["PTGDR2", "CABYR", "MGAM", "TMED9", "SHOX2", "MAT1A", "MUC5AC", "GASK1B", "CYP1A2", "RP11-266K4.9", ...]], // generated list of up-regulated genes
"down": [["MB", "OR10V1", "OR51H1", "GOLGA6L10", "OR6M1", "CDX4", "OR4C45", "SPRR2A", "SPDYE9", "GBX2", "ATP4B", ...]] // generated list of down-regulated genes
},
"mode": "meta2diff", // generation mode we specified
"message": "Done!",
"input": "[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><cell></cell><efo>EFO_0000768 </efo><datatype></datatype><drug></drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>", // actual input prompt for the model
"random_seed": 137
}
Example 2
Now let's generate a signature for a healthy human within the age group of 70-90 years, male, in tissue - whole blood.
Note, here we use disease2diff2disease
instruction, but we expect to generate signatures for a healthy human, that's why we'd set efo
to empty string "".
Alternatively, for this example we can add one more instruction to example 2 - "instruction": ["disease2diff2disease", "age_group2diff2age_group"]
{
"inputs": {
"instruction": ["disease2diff2disease", "age_group2diff2age_group"],
"tissue": ["whole blood"],
"age": "",
"cell": "",
"efo": "",
"datatype": "", "drug": "", "dose": "", "time": "", "case": "40.0-50.0", "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [],
"down": []
},
"mode": "meta2diff",
"parameters": {
"temperature": 0.8,
"top_p": 0.2,
"top_k": 3550,
"n_next_tokens": 50,
"random_seed": 137
}
}
Here is output:
{
"output": {
"up": [["IER3", "APOC2", "EDNRB", "JAKMIP2", "BACE2", ... ]],
"down": [["TBL1Y", "TDP1", "PLPP4", "CPEB1", "ITPR3", ... ]]
},
"mode": "meta2diff",
"message": "Done!",
"input": "[BOS]<disease2diff2disease><age_group2diff2age_group><tissue>whole blood </tissue><cell></cell><efo></efo><datatype></datatype><drug></drug><dose></dose><time></time><case>40.0-50.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>",
"random_seed": 137
}