Lowenzahn
/

PathoIE-Llama-2-7B

PEFT

English

Model card Files Files and versions Community

Lowenzahn commited on Aug 2, 2024

Commit

164c429

verified ·

1 Parent(s): 6add6b1

Update README.md

Browse files

Files changed (1) hide show

README.md +97 -3

README.md CHANGED Viewed

@@ -5,13 +5,20 @@ license: apache-2.0
 language:
 - en
 ---
 ## Training:
 Check out our githbub: https://github.com/HIRC-SNUBH/Curation_LLM_PathoReport.git
 ## Inference
-Since the model was trained using instructions in the ChatML format, modifications to the tokenizer are required.
 ``` python
 from datasets import load_dataset
@@ -23,7 +30,7 @@ base_model = AutoModelForCausalLM.from_pretrained(
     'meta-llama/Llama-2-7b-hf',
     trust_remote_code=True,
     device_map="auto",
-    torch_dtype=torch.bfloat16,   # If you have insufficient VRAM, lower the precision.
 )
 # Load tokenizer
@@ -42,11 +49,98 @@ model.config.eos_token_id = tokenizer.eos_token_id
 # Load PEFT
 model = PeftModel.from_pretrained(base_model, 'Lowenzahn/PathoIE-Llama-2-7B')
 model = model.eval()
 ```
-- PEFT 0.4.0
 ## Citation
 ```

 language:
 - en
 ---
+# PathoIE-Llama-2-7B
+<img src="https://cdn-uploads.huggingface.co/production/uploads/646704281dd5854d4de2cdda/HkfJMfEwnqA2wX6T-DODW.webp" width="500" />
 ## Training:
 Check out our githbub: https://github.com/HIRC-SNUBH/Curation_LLM_PathoReport.git
+- PEFT 0.4.0
 ## Inference
+Since the model was trained using instructions following the ChatML template, modifications to the tokenizer are required.
 ``` python
 from datasets import load_dataset
     'meta-llama/Llama-2-7b-hf',
     trust_remote_code=True,
     device_map="auto",
+    torch_dtype=torch.bfloat16,   # Optional, if you have insufficient VRAM, lower the precision.
 )
 # Load tokenizer
 # Load PEFT
 model = PeftModel.from_pretrained(base_model, 'Lowenzahn/PathoIE-Llama-2-7B')
+model = model.merge_and_unload()
 model = model.eval()
+# Inference
+prompts = ["Machine learning is"]
+inputs = tokenizer(prompts, return_tensors="pt")
+gen_kwargs = {"max_new_tokens": 1024, "top_p": 0.8, "temperature": 0.0, "do_sample": False, "repetition_penalty": 1.0}
+output = model.generate(inputs['input_ids'], **gen_kwargs)
+output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
+print(output)
 ```
+# Prompt example
+The pathology report used below is a fictive example.
+```
+<|im_start|> system
+You are a pathologist who specialized in lung cancer.
+Your task is extracting informations requested by the user from the lung cancer pathology report and formatting extracted informations into JSON.
+The information to be extracted is clearly specified in the report, so one must avoid from inferring information that is not present.
+Remember, you MUST answer in JSON only. Avoid any additional explanations. user
+Extract the following informations (value-set) from the report I provide.
+If the required information to extract each value in the value-set is not present in the pathology report, consider it as 'not submitted'.<|im_end|>
+<|im_start|> user
+Extract the following informations (value-set) from the report I provide.
+If the required information to extract each value in the value-set is not present in the pathology report, consider it as 'not submitted'.
+<value-set>
+- MORPHOLOGY_DIAGNOSIS
+- SUBTYPE_DOMINANT
+- MAX_SIZE_OF_TUMOR(invasive component only)
+- MAX_SIZE_OF_TUMOR(including CIS=AIS)
+- INVASION_TO_VISCERAL_PLEURAL
+- MAIN_BRONCHUS
+- INVASION_TO_CHEST_WALL
+- INVASION_TO_PARIETAL_PLEURA
+- INVASION_TO_PERICARDIUM
+- INVASION_TO_PHRENIC_NERVE
+- TUMOR_SIZE_CNT
+- LUNG_TO_LUNG_METASTASIS
+- INTRAPULMONARY_METASTASIS
+- SATELLITE_TUMOR_LOCATION
+- SEPARATE_TUMOR_LOCATION
+- INVASION_TO_MEDIASTINUM
+- INVASION_TO_DIAPHRAGM
+- INVASION_TO_HEART
+- INVASION_TO_RECURRENT_LARYNGEAL_NERVE
+- INVASION_TO_TRACHEA
+- INVASION_TO_ESOPHAGUS
+- INVASION_TO_SPINE
+- METASTATIC_RIGHT_UPPER_LOBE
+- METASTATIC_RIGHT_MIDDLE_LOBE
+- METASTATIC_RIGHT_LOWER_LOBE
+- METASTATIC_LEFT_UPPER_LOBE
+- METASTATIC_LEFT_LOWER_LOBE
+- INVASION_TO_AORTA
+- INVASION_TO_SVC
+- INVASION_TO_IVC
+- INVASION_TO_PULMONARY_ARTERY
+- INVASION_TO_PULMONARY_VEIN
+- INVASION_TO_CARINA
+- PRIMARY_CANCER_LOCATION_RIGHT_UPPER_LOBE
+- PRIMARY_CANCER_LOCATION_RIGHT_MIDDLE_LOBE
+- PRIMARY_CANCER_LOCATION_RIGHT_LOWER_LOBE
+- PRIMARY_CANCER_LOCATION_LEFT_UPPER_LOBE
+- PRIMARY_CANCER_LOCATION_LEFT_LOWER_LOBE
+- RELATED_TO_ATELECTASIS_OR_OBSTRUCTIVE_PNEUMONITIS
+- PRIMARY_SITE_LATERALITY
+- LYMPH_METASTASIS_SITES
+- NUMER_OF_LYMPH_NODE_META_CASES
+---
+<report>
+[A] Lung, left lower lobe, lobectomy
+1. ADENOSQUAMOUS CARCINOMA [by 2015 WHO classification]
+- other subtype: acinar (50%), lepidic (30%), solid (20%)
+    1) Pre-operative / Previous treatment: not done
+    2) Histologic grade: moderately differentiated
+    3) Size of tumor:
+        a. Invasive component only: 3.5 x 2.5 x 1.3 cm, 2.4 x 2.3 x 1.1 cm
+        b. Including CIS component: 3.9 x 2.6 x 1.3 cm, 3.8 x 3.1 x 1.2 cm
+    4) Extent of invasion
+        a. Invasion to visceral pleura: PRESENT (P2)
+        b. Invasion to superior vena cava: present
+    5) Main bronchus: not submitted
+    6) Necrosis: absent
+    7) Resection margin: free from carcinoma (safey margin: 1.1 cm)
+    8) Lymph node: metastasis in 2 out of 10 regional lymph nodes
+        (peribronchial lymph node: 1/3, LN#5,6 :0/1, LN#7:0/3, LN#12: 1/2)
+<|im_end|>
+<|im_start|> pathologist
+```
 ## Citation
 ```