|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HUMADEX/polish_ner_dataset |
|
language: |
|
- pl |
|
metrics: |
|
- f1 |
|
- recall |
|
- precision |
|
- confusion_matrix |
|
base_model: |
|
- google-bert/bert-base-cased |
|
pipeline_tag: token-classification |
|
tags: |
|
- NER |
|
- medical |
|
- extraction |
|
- symptom |
|
- polish |
|
--- |
|
# Polish Medical NER |
|
|
|
## Acknowledgement |
|
|
|
This model had been created as part of joint research of HUMADEX research group (https://www.linkedin.com/company/101563689/) and has received funding by the European Union Horizon Europe Research and Innovation Program project SMILE (grant number 101080923) and Marie Sk艂odowska-Curie Actions (MSCA) Doctoral Networks, project BosomShield ((rant number 101073222). Responsibility for the information and views expressed herein lies entirely with the authors. |
|
Authors: |
|
dr. Izidor Mlakar, Rigon Sallauka, dr. Umut Arioz,聽dr.聽Matej聽Rojc |
|
|
|
## Use |
|
- **Primary Use Case**: This model is designed to extract medical entities such as symptoms, diagnostic tests, and treatments from clinical text in the Polish language. |
|
- **Applications**: Suitable for healthcare professionals, clinical data analysis, and research into medical text processing. |
|
- **Supported Entity Types**: |
|
- ` |
|
PROBLEM`: Diseases, symptoms, and medical conditions. |
|
- `TEST`: Diagnostic procedures and laboratory tests. |
|
- `TREATMENT`: Medications, therapies, and other medical interventions. |
|
|
|
## Training Data |
|
- **Data Sources**: Annotated datasets, including clinical data and translations of English medical text into Polish. |
|
- **Data Augmentation**: The training dataset underwent data augmentation techniques to improve the model's ability to generalize to different text structures. |
|
- **Dataset Split**: |
|
- **Training Set**: 80% |
|
- |
|
**Validation Set**: 10% |
|
- **Test Set**: 10% |
|
|
|
## Model Training |
|
- **Training Configuration**: |
|
- **Optimizer**: AdamW |
|
- **Learning Rate**: 3e-5 |
|
- **Batch Size**: 64 |
|
- |
|
**Epochs**: 200 |
|
- **Loss Function**: Focal Loss to handle class imbalance |
|
- **Frameworks**: PyTorch, Hugging Face Transformers, SimpleTransformers |
|
|
|
## Evaluation metrics |
|
- eval_loss = 0.3968946770636102 |
|
- f1_score = 0.7556232119891866 |
|
- precision = 0.7552069671056083 |
|
- recall = 0.7560399159663865 |
|
|
|
Visit [HUMADEX/Weekly-Supervised-NER-pipline](https://github.com/HUMADEX/Weekly-Supervised-NER-pipline) for more info. |
|
|
|
## How to Use |
|
You can easily use this model with the Hugging Face `transformers` library. Here's an example of how to load and use the model for inference: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
|
model_name = "HUMADEX/polish_medical_ner" |
|
|
|
# Load the tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForTokenClassification.from_pretrained(model_name) |
|
|
|
# Sample text for inference |
|
text = "Pacjent skar偶y艂 si臋 na silne b贸le g艂owy i nudno艣ci, kt贸re utrzymywa艂y si臋 przez dwa dni. W celu z艂agodzenia objaw贸w przepisano mu paracetamol oraz zalecono odpoczynek i picie du偶ej ilo艣ci p艂yn贸w." |
|
|
|
# Tokenize the input text |
|
inputs = tokenizer(text, return_tensors="pt") |