File size: 3,224 Bytes
e209701 060ba85 e209701 060ba85 e209701 060ba85 271d883 060ba85 271d883 e209701 271d883 060ba85 271d883 874c5b1 e209701 c0e63d0 e209701 c0e63d0 e209701 874c5b1 e209701 874c5b1 e209701 874c5b1 e209701 874c5b1 e209701 b374b36 e209701 874c5b1 e209701 874c5b1 e209701 c0e63d0 e209701 271d883 e209701 060ba85 271d883 e209701 874c5b1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
license: mit
base_model: xlm-roberta-base
datasets:
- xtreme
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: roberta-base-NER
results:
- task:
name: Token Classification
type: token-classification
dataset:
name: xtreme
type: xtreme
config: PAN-X.en
split: validation
args: PAN-X.en
metrics:
- name: Precision
type: precision
value: 0.8003614625330182
- name: Recall
type: recall
value: 0.8110735418427726
- name: F1
type: f1
value: 0.8056818976978517
- name: Accuracy
type: accuracy
value: 0.9194332683336213
language:
- en
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# roberta-base-NER
## Model description
**xlm-roberta-base-multilingual-cased-ner** is a **Named Entity Recognition** model based on a fine-tuned XLM-RoBERTa base model.
It has been trained to recognize three types of entities: location (LOC), organizations (ORG), and person (PER).
Specifically, this model is a *XLMRoreberta-base-multilingual-cased* model that was fine-tuned on an aggregation of 10 high-resourced languages.
## Intended uses & limitations
#### How to use
You can use this model with Transformers *pipeline* for NER.
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Tirendaz/multilingual-xlm-roberta-for-ner")
model = AutoModelForTokenClassification.from_pretrained("Tirendaz/multilingual-xlm-roberta-for-ner")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Wolfgang and I live in Berlin"
ner_results = nlp(example)
print(ner_results)
```
Abbreviation|Description
-|-
O|Outside of a named entity
B-PER |Beginning of a person’s name right after another person’s name
I-PER |Person’s name
B-ORG |Beginning of an organisation right after another organisation
I-ORG |Organisation
B-LOC |Beginning of a location right after another location
I-LOC |Location
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
### Training results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| No log | 1.0 | 417 | 0.3359 | 0.7286 | 0.7675 | 0.7476 | 0.8991 |
| 0.4227 | 2.0 | 834 | 0.2951 | 0.7711 | 0.7980 | 0.7843 | 0.9131 |
| 0.2818 | 3.0 | 1251 | 0.2824 | 0.7852 | 0.8076 | 0.7962 | 0.9174 |
| 0.2186 | 4.0 | 1668 | 0.2853 | 0.7934 | 0.8150 | 0.8041 | 0.9193 |
| 0.1801 | 5.0 | 2085 | 0.2935 | 0.8004 | 0.8111 | 0.8057 | 0.9194 |
### Framework versions
- Transformers 4.33.0
- Pytorch 2.0.0
- Datasets 2.1.0
- Tokenizers 0.13.3 |