dmis-lab
/

ANGEL_cometa

Model card Files Files and versions Community

ANGEL_cometa / README.md

dmis-lab's picture

Update README.md

731ac09 verified about 2 months ago

|

3.37 kB

	---
	license: gpl-3.0
	language:
	- en
	metrics:
	- accuracy
	base_model: dmis-lab/ANGEL_pretrained
	---

	# Model Card for ANGEL_cometa
	This model card provides detailed information about the ANGEL_cometa model, designed for biomedical entity linking.


	# Model Details

	#### Model Description
	- Developed by: Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, Jaewoo Kang
	- Model type: Generative Biomedical Entity Linking Model
	- Language(s): English
	- License: GPL-3.0
	- Finetuned from model: BART-large (Base architecture)

	#### Model Sources

	- Github Repository: https://github.com/dmis-lab/ANGEL
	- Paper: https://arxiv.org/pdf/2408.16493


	# Direct Use
	ANGEL_cometa is a tool specifically designed for biomedical entity linking, with a focus on identifying and linking disease mentions within COMETA datasets.
	To use this model, you need to set up a virtual environment and the inference code.
	Start by cloning our [ANGEL GitHub repository](https://github.com/dmis-lab/ANGEL).
	Then, run the following script to set up the environment:
	```bash
	bash script/environment/set_environment.sh
	```

	Then, if you want to run the model on a single sample, no preprocessing is required.
	Simply execute the run_sample.sh script:

	```bash
	bash script/inference/run_sample.sh cometa
	```

	To modify the sample with your own example, refer to the [Direct Use](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#direct-use) section in our GitHub repository.
	If you're interested in training or evaluating the model, check out the [Fine-tuning](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#fine-tuning) section and [Evaluation](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#evaluation) section.
	# Training

	#### Training Data
	The model was trained on the COMETA dataset, which includes annotated disease entities.

	#### Training Procedure
	Positive-only Pre-training: Initial training using only positive examples, following the standard approach.
	Negative-aware Training: Subsequent training incorporated negative examples to improve the model's discriminative capabilities.

	# Evaluation

	### Testing Data
	The model was evaluated using COMETA dataset.

	### Metrics
	Accuracy at Top-1 (Acc@1): Measures the percentage of times the model's top prediction matches the correct entity.

	### Scores

	<table border="1" cellspacing="0" cellpadding="5" style="width: 100%; text-align: center; border-collapse: collapse; margin-left: 0;">
	<thead>
	<tr>
	<th><b>Dataset</b></th>
	<th><b>BioSYN</b><br>(Sung et al., 2020)</th>
	<th><b>SapBERT</b><br>(Liu et al., 2021)</th>
	<th><b>GenBioEL</b><br>(Yuan et al., 2022b)</th>
	<th><b>ANGEL<br>(Ours)</b></th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td><b>COMETA</b></td>
	<td>71.3</td>
	<td>75.1</td>
	<td>80.9</td>
	<td><b>82.8</b></td>
	</tr>
	</tbody>
	</table>

	The scores of GenBioEL were reproduced.



	# Citation
	If you use the ANGEL_cometa model, please cite:

	```bibtex
	@article{kim2024learning,
	title={Learning from Negative Samples in Generative Biomedical Entity Linking},
	author={Kim, Chanhwi and Kim, Hyunjae and Park, Sihyeon and Lee, Jiwoo and Sung, Mujeen and Kang, Jaewoo},
	journal={arXiv preprint arXiv:2408.16493},
	year={2024}
	}
	```

	# Contact
	For questions or issues, please contact chanhwi_kim@korea.ac.kr.