Jean-Baptiste
/

roberta-large-ner-english

Token Classification

Inference Endpoints

Model card Files Files and versions Community

Jean-Baptiste commited on Jan 5, 2022

Commit

ea64d5f

•

1 Parent(s): 12cca13

Update README.md

Files changed (1) hide show

README.md +33 -31

README.md CHANGED Viewed

@@ -24,16 +24,18 @@ Training data was classified as follow:
 Abbreviation|Description
 -|-
-O| Outside of a named entity
-MISC | Miscellaneous entity
-PER  | Person’s name
-ORG  | Organization
-LOC  | Location
 In order to simplify, the prefix B- or I- from original conll2003 was removed.
 I used the train and test dataset from original conll2003 for training and the "validation" dataset for validation. This resulted in a dataset of size:
-Train | 17494
-Validation | 3250
 ## How to use camembert-ner with HuggingFace
@@ -90,31 +92,31 @@ nlp("Apple was founded in 1976 by Steve Jobs, Steve Wozniak and Ronald Wayne to
 ## Model performances
 Model performances computed on conll2003 validation dataset (computed on the tokens predictions)
-```
-entity | precision | recall | f1
-- | - | - | -
-PER | 0.9914 | 0.9927 | 0.9920
-ORG | 0.9627 | 0.9661 |	0.9644
-LOC | 0.9795 | 0.9862 |	0.9828
-MISC | 0.9292 |	0.9262 | 0.9277
-Overall | 0.9740 | 0.9766 |	0.9753
-```
 On private dataset (email, chat, informal discussion), computed on word predictions:
-```
-entity | precision | recall | f1
-- | - | - | -
-PER | 0.8823 | 0.9116 | 0.8967
-ORG | 0.7694 | 0.7292 | 0.7487
-LOC | 0.8619 | 0.7768 |	0.8171
-```
-Spacy (en_core_web_trf-3.2.0) on the same private dataset was giving:
-```
-entity | precision | recall | f1
-- | - | - | -
-PER | 0.9146 | 0.8287 | 0.8695
-ORG | 0.7655 | 0.6437 |	0.6993
-LOC | 0.8727 | 0.6180 |	0.7236
-```

 Abbreviation|Description
 -|-
+O |Outside of a named entity
+MISC |Miscellaneous entity
+PER |Person’s name
+ORG |Organization
+LOC |Location
 In order to simplify, the prefix B- or I- from original conll2003 was removed.
 I used the train and test dataset from original conll2003 for training and the "validation" dataset for validation. This resulted in a dataset of size:
+Train | Validation
+-|-
+17494 | 3250
 ## How to use camembert-ner with HuggingFace
 ## Model performances
 Model performances computed on conll2003 validation dataset (computed on the tokens predictions)
+entity|precision|recall|f1
+-|-|-|-
+PER|0.9914|0.9927|0.9920
+PER|0.9914|0.9927|0.9920
+ORG|0.9627|0.9661|0.9644
+LOC|0.9795|0.9862|0.9828
+MISC|0.9292|0.9262|0.9277
+Overall|0.9740|0.9766|0.9753
 On private dataset (email, chat, informal discussion), computed on word predictions:
+entity|precision|recall|f1
+-|-|-|-
+PER|0.8823|0.9116|0.8967
+ORG|0.7694|0.7292|0.7487
+LOC|0.8619|0.7768|0.8171
+By comparison on the same private dataset, Spacy (en_core_web_trf-3.2.0) was giving:
+entity|precision|recall|f1
+-|-|-|-
+PER|0.9146|0.8287|0.8695
+ORG|0.7655|0.6437|0.6993
+LOC|0.8727|0.6180|0.7236