SpanMarker

This is a SpanMarker model that can be used for Named Entity Recognition.

Model Details

Model Description

Model Type: SpanMarker
Maximum Sequence Length: 512 tokens
Maximum Entity Length: 16 words

Model Sources

Repository: SpanMarker on GitHub
Thesis: SpanMarker For Named Entity Recognition

Model Labels

Label	Examples
person	"Barney Glaser", "Malcolm Gladwell", "Charles Duhigg"
publication_date	"2000", "1967", "2018"
publisher	"Little , Brown and Company", "Sociology Press", "Avery"
work_of_art	" `The Tipping Point : How Little Things Can Make a Big Difference ''", "` The Power of Habit ''", "`` The Discovery of Grounded Theory ''"

Evaluation

Metrics

Label	Precision	Recall	F1
all	0.0	0.0	0.0
person	0.0	0.0	0.0
publication_date	0.0	0.0	0.0
publisher	0.0	0.0	0.0
work_of_art	0.0	0.0	0.0

Uses

Direct Use for Inference

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
# Run inference
entities = model.predict("\"The Pragmatic Turn\" (2020, University of Pennsylvania Press) provides key insights into pragmatist philosophy, edited by John J. Stuhr . For provocative science, try \"Introducing Consciousness\", Alex Westrin and Vidyut Lokhande's 2018 work published via Icon Books, challenging dominant models of self-awareness.")

Downstream Use

You can finetune this model on your own dataset.

Click to expand

from span_marker import SpanMarkerModel, Trainer

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")

# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Sentence length	47	104.6034	200
Entities per sentence	3	4.0036	5

Training Hyperparameters

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training Results

Epoch	Step	Validation Loss	Validation Accuracy
1.0	563	0.0206	0.8513
2.0	1126	0.0173	0.8513
3.0	1689	0.0162	0.8513

Framework Versions

Python: 3.10.13
SpanMarker: 1.5.1.dev
Transformers: 4.39.3
PyTorch: 2.1.2
Datasets: 2.16.0
Tokenizers: 0.15.0

Citation

BibTeX

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}

ClovenDoug
/

span-marker-tiny-bert-ner-citation-grabber

SpanMarker

Model Details

Model Description

Model Sources

Model Labels

Evaluation

Metrics

Uses

Direct Use for Inference

Downstream Use

Training Details

Training Set Metrics

Training Hyperparameters

Training Results

Framework Versions

Citation

BibTeX

Evaluation results