--- license: cc-by-nc-4.0 --- ## Precious3-GPT A multi-omics multi-species language model. - **Developer**: [Insilico Medicine](https://insilico.com/precious) - **License**: cc-by-nc-4.0 - **Model size**: 88.3 million parameters - **Domain**: Biomedical - **Base architecture**: [MPT](https://huggingface.co/mosaicml/mpt-7b) ## Quickstart Precious-GPT can be loaded and run as standard Causal Language Model through transformers interface like this: ```python # Load model and tokenizer from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True) model = AutoModel.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True) ``` However for the convenience of using all the functionality of the Precious3-GPT model, we provide a handler. ### Run model using Prpecious3-GPT handler step by step **Step 1 - download Prpecious3-GPT [handler.py](https://huggingface.co/insilicomedicine/precious3-gpt/blob/main/handler.py)** ```python from handler import EndpointHandler precious3gpt_handler = EndpointHandler() ``` **Step 2 - create input for the handler** ```python import json with open('./generation-configs/meta2diff.json', 'r') as f: config_data = json.load(f) # prepare request configuration request_config = {"inputs": config_data, "mode": "meta2diff", "parameters": { "temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137 }} ``` **How Precisou3-GPT will see given request** ```text [BOS]lung EFO_0000768 expression curcumin 70.0-80.0 80.0-90.0 m human ``` **Step 3 - run Precisou3-GPT** ```python output = precious3gpt_handler(request_config) ``` **Handler output structure** ```json { "output": { "up": List, "down": List }, "mode": String, // Generation mode was selected "message": "Done!", // or Error "input": String // Input prompt was passed } ``` Note: If the ```mode``` was supposed to generate compounds, the output would contain ```compounds: List```. --- ## Precious3-GPT request configuration ### Generation Modes (`mode` in config) Choose the appropriate mode based on your requirements: 1. **meta2diff**: Generate signature (up- and down- gene lists) given meta-data such as tissue, compound, gender, etc. 2. **diff2compound**: Predict compounds based on signature. 3. **meta2diff2compound**: Generate signatures given meta-data and then predict compounds based on generated signatures. --- ### Instruction (`inputs.instruction` in config) 1. disease2diff2disease - generate signature for disease / predict disease based on given signature 2. compound2diff2compound - generate signature for compound / predict compound based on given signature 3. age_group2diff2age_group - generate signature for age group / predict age group based on signature ### Other meta-data (`inputs.` in config) Full list of available values for each meta-data item you can find in ```p3_entities_with_type.csv``` ## Examples In the following examples all possible configuration fields are specified. You can leave some meta-data fields in the ```inputs``` section empty string(```""```) or empty list(```[]```). _**Example 1**_ If you want to generate a signature given specific meta-data you can use the following configuration. Note, ```up``` and ```down``` fields are empty lists as you want to generate them. Here we ask the model to generate a signature for a human within the age group of 70-90 years, male, in tissue - Lungs with disease EFO_0000768. ```json { "inputs": { "instruction": ["age_group2diff2age_group", "disease2diff2disease", "compound2diff2compound"], "tissue": ["lung"], "age": "", "cell": "", "efo": "EFO_0000768", "datatype": "", "drug": "", "dose": "", "time": "", "case": ["70.0-80.0", "80.0-90.0"], "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], "down": [] }, "mode": "meta2diff", "parameters": { "temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137 } } ``` Here is output: ```json { "output": { "up": [["PTGDR2", "CABYR", "MGAM", "TMED9", "SHOX2", "MAT1A", "MUC5AC", "GASK1B", "CYP1A2", "RP11-266K4.9", ...]], // generated list of up-regulated genes "down": [["MB", "OR10V1", "OR51H1", "GOLGA6L10", "OR6M1", "CDX4", "OR4C45", "SPRR2A", "SPDYE9", "GBX2", "ATP4B", ...]] // generated list of down-regulated genes }, "mode": "meta2diff", // generation mode we specified "message": "Done!", "input": "[BOS]lung EFO_0000768 70.0-80.0 80.0-90.0 expression m human ", // actual input prompt for the model "random_seed": 137 } ``` _**Example 2**_ Now let's generate a signature for a healthy human within the age group of 70-90 years, male, in tissue - whole blood. Note, here we use ```disease2diff2disease``` instruction, but we expect to generate signatures for a healthy human, that's why we'd set ```efo``` to empty string "". Alternatively, for this example we can add one more instruction to example 2 - "instruction": ["disease2diff2disease", "age_group2diff2age_group"] ```json { "inputs": { "instruction": ["disease2diff2disease", "age_group2diff2age_group"], "tissue": ["whole blood"], "age": "", "cell": "", "efo": "", "datatype": "", "drug": "", "dose": "", "time": "", "case": "40.0-50.0", "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], "down": [] }, "mode": "meta2diff", "parameters": { "temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137 } } ``` Here is output: ```json { "output": { "up": [["IER3", "APOC2", "EDNRB", "JAKMIP2", "BACE2", ... ]], "down": [["TBL1Y", "TDP1", "PLPP4", "CPEB1", "ITPR3", ... ]] }, "mode": "meta2diff", "message": "Done!", "input": "[BOS]whole blood 40.0-50.0 expression m human ", "random_seed": 137 } ```