Model Card for MedChat3.5

Model Details

Model Description

MedChat3.5 is a specialized language model based on the OpenChat 3.5 architecture, fine-tuned for biomedical natural language processing (NLP) tasks. The model has been tailored using the Llama2-MedTuned-Instructions dataset, which includes approximately 200,000 samples specifically designed for instruction-based learning in biomedical contexts. The model excels in tasks such as Named Entity Recognition (NER), Relation Extraction (RE), Medical Natural Language Inference (NLI), Document Classification, and Question Answering (QA).

Developed by: Imran Ullah
Model type: Language Model (LM), fine-tuned for medical NLP
Language(s) (NLP): English (Biomedical Text)
License: [MIT]
Finetuned from model [optional]: OpenChat 3.5

Dataset Information

Dataset Name: Llama2-MedTuned-Instructions

Dataset Description

Llama2-MedTuned-Instructions is an instruction-based dataset developed for training language models in biomedical NLP tasks. Comprising approximately 200,000 samples, the dataset guides models through tasks like Named Entity Recognition (NER), Relation Extraction (RE), Medical Natural Language Inference (NLI), Document Classification, and Question Answering (QA). It consolidates subsets from well-known biomedical datasets, ensuring a diverse and comprehensive training experience.

Source Datasets and Composition

Named Entity Recognition (NER): NCBI-disease, BC5CDR-disease, BC5CDR-chem, BC2GM, JNLPBA, i2b2-2012
Relation Extraction (RE): i2b2-2010, GAD
Natural Language Inference (NLI): MedNLI
Document Classification: Hallmarks of cancer (HoC)
Question Answering (QA): ChatDoctor, PMC-Llama-Instructions

Prompting Strategy

Each sample in the dataset follows a three-part structure: Instruction, Input, and Output, facilitating instruction-based learning.

Usage and Application

Ideal for training and evaluating models on biomedical NLP tasks, MedChat3.5 serves as a benchmark for assessing model performance in domain-specific tasks, comparing against established models like BioBERT and BioClinicalBERT.

Inference Instructions

To use MedChat3.5 for inference, follow the provided code snippet using the transformers library. Make sure to install the necessary packages and authenticate using an Hugging Face API token. Adjust parameters like temperature, top-p, and top-k for desired generation behavior. The model is optimized for tasks such as question answering and generating responses in biomedical contexts.

# Example Inference Code
!pip install -q --upgrade git+https://github.com/huggingface/transformers.git
!pip install -q accelerate datasets bitsandbytes peft

# user your own hugging face secret token
from google.colab import userdata
hf_token = userdata.get('HF_TOKEN')

import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
from transformers import AutoTokenizer, SinkCache, AutoModelForCausalLM, TextStreamer

path = "Imran1/MedChat3.5"

# Load base LLM model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    path,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    load_in_4bit=True,
    token=hf_token,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(path, token=hf_token)

tokenizer.eos_token_id = model.config.eos_token_id
tokenizer.pad_token = tokenizer.eos_token
streamer = TextStreamer(tokenizer)

tx = '''
GPT4 Correct Assistant: you are a stomach specialist.<|end_of_turn|>
GPT4 Correct User: What role does gastric acid play in the process of digestion, and how does the stomach regulate its secretion to maintain a healthy digestive environment?<|end_of_turn|>
GPT4 Correct Assistant:
'''

import warnings
warnings.filterwarnings('ignore')  # Ignore all warnings

inputs = tokenizer(tx, return_tensors="pt", return_attention_mask=False).to('cuda')
generation_params = {
    'max_new_tokens': 500,
    'use_cache': True,
    'do_sample': True,
    'temperature': 0.7,
    'top_p': 0.9,
    'top_k': 50
}

outputs = model.generate(**inputs, **generation_params, streamer=streamer)
decoded_outputs = tokenizer.batch_decode(outputs)

# output
'''
<s> 
GPT4 Correct Assistant:  you are stomach specialist.<|end_of_turn|> 
GPT4 Correct User: What role does gastric acid play in the process of digestion, and how does the stomach regulate its secretion to maintain a healthy digestive environment?<|end_of_turn|> 
GPT4 Correct Assistant:
Gastric acid plays a crucial role in the process of digestion by breaking down food into its basic components. It is secreted by the cells lining the stomach, known as parietal cells, in response to the presence of food in the stomach.

The stomach regulates the secretion of gastric acid through a series of mechanisms that maintain a healthy digestive environment. The primary mechanism is the release of gastrin, a hormone produced by the stomach's G-cells in response to the presence of food. Gastrin stimulates the parietal cells to secrete gastric acid, which in turn aids in the breakdown of food.

The stomach also regulates the secretion of gastric acid through the release of histamine, which is produced by the ECL cells in response to the presence of food. Histamine acts on the parietal cells to stimulate gastric acid secretion.

Another mechanism involves the production of intrinsic factor, a protein produced by the stomach's mucous cells. Intrinsic factor is essential for the absorption of vitamin B12 in the small intestine. The production of intrinsic factor is regulated by gastric acid, which helps maintain a healthy balance of this essential nutrient.

Additionally, the stomach regulates the secretion of gastric acid through the release of somatostatin, a hormone produced by the D-cells of the stomach. Somatostatin inhibits gastric acid secretion, helping to maintain a healthy balance between acid production and neutralization.

In summary, the stomach regulates the secretion of gastric acid through a series of mechanisms that maintain a healthy digestive environment. These mechanisms include the release of gastrin, histamine, and intrinsic factor, as well as the release of somatostatin. By maintaining a balance between acid production and neutralization, the stomach ensures that the digestive environment remains conducive to proper digestion and absorption of nutrients.<|end_of_turn|>
'''