metadata
license: apache-2.0
library_name: span-marker
tags:
- span-marker
- token-classification
- ner
- named-entity-recognition
pipeline_tag: token-classification
widget:
- text: >-
X-Linked adrenoleukodystrophy (ALD) is a genetic disease associated with
demyelination of the central nervous system, adrenal insufficiency, and
accumulation of very long chain fatty acids in tissue and body fluids.
example_title: Example 1
- text: >-
Canavan disease is inherited as an autosomal recessive trait that is
caused by the deficiency of aspartoacylase (ASPA).
example_title: Example 2
- text: >-
However, both models lack other frequent DM symptoms including the
fibre-type dependent atrophy, myotonia, cataract and male-infertility.
example_title: Example 3
model-index:
- name: SpanMarker w. bert-base-cased on NCBI Disease by Tom Aarsen
results:
- task:
type: token-classification
name: Named Entity Recognition
dataset:
type: ncbi_disease
name: NCBI Disease
split: test
revision: acd0e6451198d5b615c12356ab6a05fff4610920
metrics:
- type: f1
value: 0.8813
name: F1
- type: precision
value: 0.8661
name: Precision
- type: recall
value: 0.8971
name: Recall
datasets:
- ncbi_disease
language:
- en
metrics:
- f1
- recall
- precision
SpanMarker for Disease Named Entity Recognition
This is a SpanMarker model trained on the ncbi_disease dataset. In particular, this SpanMarker model uses bert-base-cased as the underlying encoder. See train.py for the training script.
Metrics
This model achieves the following results on the testing set:
- Overall Precision: 0.8661
- Overall Recall: 0.8971
- Overall F1: 0.8813
- Overall Accuracy: 0.9837
Labels
Label | Examples |
---|---|
DISEASE | "ataxia-telangiectasia", "T-cell leukaemia", "C5D", "neutrophilic leukocytosis", "pyogenic infection" |
Usage
To use this model for inference, first install the span_marker
library:
pip install span_marker
You can then run inference with this model like so:
from span_marker import SpanMarkerModel
# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-ncbi-disease")
# Run inference
entities = model.predict("Canavan disease is inherited as an autosomal recessive trait that is caused by the deficiency of aspartoacylase (ASPA).")
See the SpanMarker repository for documentation and additional information on this library.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy |
---|---|---|---|---|---|---|---|
0.0038 | 1.41 | 300 | 0.0059 | 0.8141 | 0.8579 | 0.8354 | 0.9818 |
0.0018 | 2.82 | 600 | 0.0054 | 0.8315 | 0.8720 | 0.8513 | 0.9840 |
Framework versions
- SpanMarker 1.2.4
- Transformers 4.31.0
- Pytorch 1.13.1+cu117
- Datasets 2.14.3
- Tokenizers 0.13.2