File size: 3,374 Bytes
731ac09 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
license: gpl-3.0
language:
- en
metrics:
- accuracy
base_model: dmis-lab/ANGEL_pretrained
---
# Model Card for ANGEL_cometa
This model card provides detailed information about the ANGEL_cometa model, designed for biomedical entity linking.
# Model Details
#### Model Description
- **Developed by:** Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, Jaewoo Kang
- **Model type:** Generative Biomedical Entity Linking Model
- **Language(s):** English
- **License:** GPL-3.0
- **Finetuned from model:** BART-large (Base architecture)
#### Model Sources
- **Github Repository:** https://github.com/dmis-lab/ANGEL
- **Paper:** https://arxiv.org/pdf/2408.16493
# Direct Use
ANGEL_cometa is a tool specifically designed for biomedical entity linking, with a focus on identifying and linking disease mentions within COMETA datasets.
To use this model, you need to set up a virtual environment and the inference code.
Start by cloning our [ANGEL GitHub repository](https://github.com/dmis-lab/ANGEL).
Then, run the following script to set up the environment:
```bash
bash script/environment/set_environment.sh
```
Then, if you want to run the model on a single sample, no preprocessing is required.
Simply execute the run_sample.sh script:
```bash
bash script/inference/run_sample.sh cometa
```
To modify the sample with your own example, refer to the [Direct Use](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#direct-use) section in our GitHub repository.
If you're interested in training or evaluating the model, check out the [Fine-tuning](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#fine-tuning) section and [Evaluation](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#evaluation) section.
# Training
#### Training Data
The model was trained on the COMETA dataset, which includes annotated disease entities.
#### Training Procedure
Positive-only Pre-training: Initial training using only positive examples, following the standard approach.
Negative-aware Training: Subsequent training incorporated negative examples to improve the model's discriminative capabilities.
# Evaluation
### Testing Data
The model was evaluated using COMETA dataset.
### Metrics
Accuracy at Top-1 (Acc@1): Measures the percentage of times the model's top prediction matches the correct entity.
### Scores
<table border="1" cellspacing="0" cellpadding="5" style="width: 100%; text-align: center; border-collapse: collapse; margin-left: 0;">
<thead>
<tr>
<th><b>Dataset</b></th>
<th><b>BioSYN</b><br>(Sung et al., 2020)</th>
<th><b>SapBERT</b><br>(Liu et al., 2021)</th>
<th><b>GenBioEL</b><br>(Yuan et al., 2022b)</th>
<th><b>ANGEL<br>(Ours)</b></th>
</tr>
</thead>
<tbody>
<tr>
<td><b>COMETA</b></td>
<td>71.3</td>
<td>75.1</td>
<td>80.9</td>
<td><b>82.8</b></td>
</tr>
</tbody>
</table>
The scores of GenBioEL were reproduced.
# Citation
If you use the ANGEL_cometa model, please cite:
```bibtex
@article{kim2024learning,
title={Learning from Negative Samples in Generative Biomedical Entity Linking},
author={Kim, Chanhwi and Kim, Hyunjae and Park, Sihyeon and Lee, Jiwoo and Sung, Mujeen and Kang, Jaewoo},
journal={arXiv preprint arXiv:2408.16493},
year={2024}
}
```
# Contact
For questions or issues, please contact chanhwi_kim@korea.ac.kr. |