Lowenzahn commited on
Commit
164c429
·
verified ·
1 Parent(s): 6add6b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -3
README.md CHANGED
@@ -5,13 +5,20 @@ license: apache-2.0
5
  language:
6
  - en
7
  ---
 
 
 
 
 
8
  ## Training:
9
 
10
  Check out our githbub: https://github.com/HIRC-SNUBH/Curation_LLM_PathoReport.git
11
 
 
 
12
  ## Inference
13
 
14
- Since the model was trained using instructions in the ChatML format, modifications to the tokenizer are required.
15
 
16
  ``` python
17
  from datasets import load_dataset
@@ -23,7 +30,7 @@ base_model = AutoModelForCausalLM.from_pretrained(
23
  'meta-llama/Llama-2-7b-hf',
24
  trust_remote_code=True,
25
  device_map="auto",
26
- torch_dtype=torch.bfloat16, # If you have insufficient VRAM, lower the precision.
27
  )
28
 
29
  # Load tokenizer
@@ -42,11 +49,98 @@ model.config.eos_token_id = tokenizer.eos_token_id
42
 
43
  # Load PEFT
44
  model = PeftModel.from_pretrained(base_model, 'Lowenzahn/PathoIE-Llama-2-7B')
 
45
  model = model.eval()
 
 
 
 
 
 
 
 
46
  ```
47
 
48
- - PEFT 0.4.0
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ## Citation
52
  ```
 
5
  language:
6
  - en
7
  ---
8
+ # PathoIE-Llama-2-7B
9
+
10
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/646704281dd5854d4de2cdda/HkfJMfEwnqA2wX6T-DODW.webp" width="500" />
11
+
12
+
13
  ## Training:
14
 
15
  Check out our githbub: https://github.com/HIRC-SNUBH/Curation_LLM_PathoReport.git
16
 
17
+ - PEFT 0.4.0
18
+
19
  ## Inference
20
 
21
+ Since the model was trained using instructions following the ChatML template, modifications to the tokenizer are required.
22
 
23
  ``` python
24
  from datasets import load_dataset
 
30
  'meta-llama/Llama-2-7b-hf',
31
  trust_remote_code=True,
32
  device_map="auto",
33
+ torch_dtype=torch.bfloat16, # Optional, if you have insufficient VRAM, lower the precision.
34
  )
35
 
36
  # Load tokenizer
 
49
 
50
  # Load PEFT
51
  model = PeftModel.from_pretrained(base_model, 'Lowenzahn/PathoIE-Llama-2-7B')
52
+ model = model.merge_and_unload()
53
  model = model.eval()
54
+
55
+ # Inference
56
+ prompts = ["Machine learning is"]
57
+ inputs = tokenizer(prompts, return_tensors="pt")
58
+ gen_kwargs = {"max_new_tokens": 1024, "top_p": 0.8, "temperature": 0.0, "do_sample": False, "repetition_penalty": 1.0}
59
+ output = model.generate(inputs['input_ids'], **gen_kwargs)
60
+ output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
61
+ print(output)
62
  ```
63
 
 
64
 
65
+ # Prompt example
66
+
67
+ The pathology report used below is a fictive example.
68
+
69
+ ```
70
+ <|im_start|> system
71
+ You are a pathologist who specialized in lung cancer.
72
+ Your task is extracting informations requested by the user from the lung cancer pathology report and formatting extracted informations into JSON.
73
+ The information to be extracted is clearly specified in the report, so one must avoid from inferring information that is not present.
74
+ Remember, you MUST answer in JSON only. Avoid any additional explanations. user
75
+ Extract the following informations (value-set) from the report I provide.
76
+ If the required information to extract each value in the value-set is not present in the pathology report, consider it as 'not submitted'.<|im_end|>
77
+ <|im_start|> user
78
+ Extract the following informations (value-set) from the report I provide.
79
+ If the required information to extract each value in the value-set is not present in the pathology report, consider it as 'not submitted'.
80
+ <value-set>
81
+ - MORPHOLOGY_DIAGNOSIS
82
+ - SUBTYPE_DOMINANT
83
+ - MAX_SIZE_OF_TUMOR(invasive component only)
84
+ - MAX_SIZE_OF_TUMOR(including CIS=AIS)
85
+ - INVASION_TO_VISCERAL_PLEURAL
86
+ - MAIN_BRONCHUS
87
+ - INVASION_TO_CHEST_WALL
88
+ - INVASION_TO_PARIETAL_PLEURA
89
+ - INVASION_TO_PERICARDIUM
90
+ - INVASION_TO_PHRENIC_NERVE
91
+ - TUMOR_SIZE_CNT
92
+ - LUNG_TO_LUNG_METASTASIS
93
+ - INTRAPULMONARY_METASTASIS
94
+ - SATELLITE_TUMOR_LOCATION
95
+ - SEPARATE_TUMOR_LOCATION
96
+ - INVASION_TO_MEDIASTINUM
97
+ - INVASION_TO_DIAPHRAGM
98
+ - INVASION_TO_HEART
99
+ - INVASION_TO_RECURRENT_LARYNGEAL_NERVE
100
+ - INVASION_TO_TRACHEA
101
+ - INVASION_TO_ESOPHAGUS
102
+ - INVASION_TO_SPINE
103
+ - METASTATIC_RIGHT_UPPER_LOBE
104
+ - METASTATIC_RIGHT_MIDDLE_LOBE
105
+ - METASTATIC_RIGHT_LOWER_LOBE
106
+ - METASTATIC_LEFT_UPPER_LOBE
107
+ - METASTATIC_LEFT_LOWER_LOBE
108
+ - INVASION_TO_AORTA
109
+ - INVASION_TO_SVC
110
+ - INVASION_TO_IVC
111
+ - INVASION_TO_PULMONARY_ARTERY
112
+ - INVASION_TO_PULMONARY_VEIN
113
+ - INVASION_TO_CARINA
114
+ - PRIMARY_CANCER_LOCATION_RIGHT_UPPER_LOBE
115
+ - PRIMARY_CANCER_LOCATION_RIGHT_MIDDLE_LOBE
116
+ - PRIMARY_CANCER_LOCATION_RIGHT_LOWER_LOBE
117
+ - PRIMARY_CANCER_LOCATION_LEFT_UPPER_LOBE
118
+ - PRIMARY_CANCER_LOCATION_LEFT_LOWER_LOBE
119
+ - RELATED_TO_ATELECTASIS_OR_OBSTRUCTIVE_PNEUMONITIS
120
+ - PRIMARY_SITE_LATERALITY
121
+ - LYMPH_METASTASIS_SITES
122
+ - NUMER_OF_LYMPH_NODE_META_CASES
123
+ ---
124
+ <report>
125
+ [A] Lung, left lower lobe, lobectomy
126
+ 1. ADENOSQUAMOUS CARCINOMA [by 2015 WHO classification]
127
+ - other subtype: acinar (50%), lepidic (30%), solid (20%)
128
+ 1) Pre-operative / Previous treatment: not done
129
+ 2) Histologic grade: moderately differentiated
130
+ 3) Size of tumor:
131
+ a. Invasive component only: 3.5 x 2.5 x 1.3 cm, 2.4 x 2.3 x 1.1 cm
132
+ b. Including CIS component: 3.9 x 2.6 x 1.3 cm, 3.8 x 3.1 x 1.2 cm
133
+ 4) Extent of invasion
134
+ a. Invasion to visceral pleura: PRESENT (P2)
135
+ b. Invasion to superior vena cava: present
136
+ 5) Main bronchus: not submitted
137
+ 6) Necrosis: absent
138
+ 7) Resection margin: free from carcinoma (safey margin: 1.1 cm)
139
+ 8) Lymph node: metastasis in 2 out of 10 regional lymph nodes
140
+ (peribronchial lymph node: 1/3, LN#5,6 :0/1, LN#7:0/3, LN#12: 1/2)
141
+ <|im_end|>
142
+ <|im_start|> pathologist
143
+ ```
144
 
145
  ## Citation
146
  ```