da_dacy_large_trf / README.md
KennethEnevoldsen's picture
Added readme
963232f
metadata
tags:
  - spacy
  - dacy
  - danish
  - token-classification
  - pos tagging
  - morphological analysis
  - lemmatization
  - dependency parsing
  - named entity recognition
  - coreference resolution
  - named entity linking
  - named entity disambiguation
language:
  - da
license: apache-2.0
model-index:
  - name: da_dacy_large_trf-0.2.0
    results:
      - task:
          name: NER
          type: token-classification
        metrics:
          - name: NER Precision
            type: precision
            value: 0.8858195212
          - name: NER Recall
            type: recall
            value: 0.8620071685
          - name: NER F Score
            type: f_score
            value: 0.8737511353
        dataset:
          name: DaNE
          split: test
          type: dane
      - task:
          name: TAG
          type: token-classification
        metrics:
          - name: TAG (XPOS) Accuracy
            type: accuracy
            value: 0.9913668347
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: POS
          type: token-classification
        metrics:
          - name: POS (UPOS) Accuracy
            type: accuracy
            value: 0.9908174469
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: MORPH
          type: token-classification
        metrics:
          - name: Morph (UFeats) Accuracy
            type: accuracy
            value: 0.9880227568
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: LEMMA
          type: token-classification
        metrics:
          - name: Lemma Accuracy
            type: accuracy
            value: 0.9589423796
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: UNLABELED_DEPENDENCIES
          type: token-classification
        metrics:
          - name: Unlabeled Attachment Score (UAS)
            type: f_score
            value: 0.9280885781
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: LABELED_DEPENDENCIES
          type: token-classification
        metrics:
          - name: Labeled Attachment Score (LAS)
            type: f_score
            value: 0.9079997669
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: SENTS
          type: token-classification
        metrics:
          - name: Sentences F-Score
            type: f_score
            value: 1
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: coreference-resolution
          type: coreference-resolution
        metrics:
          - name: LEA
            type: f_score
            value: 0.4672143289
        dataset:
          name: DaCoref
          type: alexandrainst/dacoref
          split: custom
      - task:
          name: coreference-resolution
          type: coreference-resolution
        metrics:
          - name: Named entity Linking Precision
            type: precision
            value: 0.84
          - name: Named entity Linking Recall
            type: recall
            value: 0.2153846154
          - name: Named entity Linking F Score
            type: f_score
            value: 0.3428571429
        dataset:
          name: DaNED
          type: named-entity-linking
          split: custom
library_name: spacy
datasets:
  - universal_dependencies
  - dane
  - alexandrainst/dacoref
metrics:
  - accuracy

DaCy large

DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines. DaCy's largest pipeline has achieved State-of-the-Art performance on parts-of-speech tagging and dependency parsing for Danish on the Danish Dependency treebank as well as competitive performance on named entity recognition, named entity disambiguation and coreference resolution. To read more check out the DaCy repository for material on how to use DaCy and reproduce the results. DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.

Feature Description
Name da_dacy_large_trf
Version 0.2.0
spaCy >=3.5.2,<3.6.0
Default Pipeline transformer, tagger, morphologizer, trainable_lemmatizer, parser, ner, coref, span_resolver, span_cleaner, entity_linker
Components transformer, tagger, morphologizer, trainable_lemmatizer, parser, ner, coref, span_resolver, span_cleaner, entity_linker
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources UD Danish DDT v2.11 (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)
DaNE (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)
DaCoref (Buch-Kromann, Matthias)
DaNED (Barrett, M. J., Lam, H., Wu, M., Lacroix, O., Plank, B., & Søgaard, A.)
chcaa/dfm-encoder-large-v1 (The Danish Foundation Models team)
License Apache-2.0
Author Kenneth Enevoldsen

Label Scheme

View label scheme (211 labels for 4 components)
Component Labels
tagger ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
morphologizer AdpType=Prep|POS=ADP, Definite=Ind|Gender=Com|Number=Sing|POS=NOUN, Mood=Ind|POS=AUX|Tense=Pres|VerbForm=Fin|Voice=Act, POS=PROPN, Definite=Ind|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part, Definite=Def|Gender=Neut|Number=Sing|POS=NOUN, POS=SCONJ, Definite=Def|Gender=Com|Number=Sing|POS=NOUN, Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Act, POS=ADV, Number=Plur|POS=DET|PronType=Dem, Degree=Pos|Number=Plur|POS=ADJ, Definite=Ind|Gender=Com|Number=Plur|POS=NOUN, POS=PUNCT, NumType=Ord|POS=ADJ, POS=CCONJ, Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN, POS=VERB|VerbForm=Inf|Voice=Act, Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs, Degree=Sup|POS=ADV, Degree=Pos|POS=ADV, Gender=Com|Number=Sing|POS=DET|PronType=Ind, Number=Plur|POS=DET|PronType=Ind, POS=ADP, POS=ADV|PartType=Inf, Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs, Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act, Definite=Def|Degree=Pos|Number=Sing|POS=ADJ, Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs, Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act, POS=ADP|PartType=Inf, Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ, NumType=Card|POS=NUM, Degree=Pos|POS=ADJ, Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part, POS=PART|PartType=Inf, Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes, Definite=Def|Gender=Com|Number=Plur|POS=NOUN, Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN, Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs, POS=VERB|Tense=Pres|VerbForm=Part, Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs, Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN, Definite=Def|Degree=Sup|Number=Plur|POS=ADJ, Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs, POS=AUX|VerbForm=Inf|Voice=Act, Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ, Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ, Degree=Cmp|POS=ADJ, POS=PRON|PartType=Inf, Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ, Case=Nom|Gender=Com|POS=PRON|PronType=Ind, Number=Plur|POS=PRON|PronType=Ind, POS=INTJ, Gender=Com|Number=Sing|POS=DET|PronType=Dem, Case=Gen|Number=Plur|POS=DET|PronType=Ind, Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass, Definite=Def|Gender=Neut|Number=Plur|POS=NOUN, Degree=Cmp|POS=ADV, Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form, Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs, Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, Case=Gen|POS=PROPN, Gender=Neut|Number=Sing|POS=PRON|PronType=Ind, Number=Plur|POS=VERB|Tense=Past|VerbForm=Part, Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs, Definite=Def|Degree=Sup|POS=ADJ, Gender=Neut|Number=Sing|POS=DET|PronType=Ind, Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN, Gender=Neut|Number=Sing|POS=DET|PronType=Dem, Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part, POS=PRON|PronType=Dem, Degree=Pos|Gender=Com|Number=Sing|POS=ADJ, Number=Plur|POS=NUM, POS=VERB|VerbForm=Inf|Voice=Pass, Definite=Def|Degree=Sup|Number=Sing|POS=ADJ, Number=Sing|POS=PRON|PronType=Int,Rel, Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs, Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs, Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs, POS=PRON, Definite=Ind|Number=Sing|POS=NOUN, Definite=Ind|Number=Sing|POS=NUM, Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN, Foreign=Yes|POS=ADV, POS=NOUN, Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN, Gender=Com|Number=Plur|POS=NOUN, Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel, Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs, Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs, Gender=Com|Number=Sing|POS=PRON|PronType=Ind, Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN, Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ, Degree=Sup|POS=ADJ, Degree=Pos|Number=Sing|POS=ADJ, Mood=Imp|POS=VERB, Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs, Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs, POS=X, Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN, Number=Plur|POS=PRON|PronType=Dem, Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs, Number=Plur|POS=PRON|PronType=Int,Rel, Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, Degree=Cmp|Number=Plur|POS=ADJ, Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs, Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form, Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs, Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs, Gender=Com|POS=PRON|PronType=Int,Rel, Case=Gen|Degree=Pos|Number=Plur|POS=ADJ, Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, POS=VERB|VerbForm=Ger, Gender=Com|Number=Sing|POS=PRON|PronType=Dem, Case=Gen|POS=PRON|PronType=Int,Rel, Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass, Abbr=Yes|POS=X, Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN, Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs, Definite=Ind|Number=Plur|POS=NOUN, Foreign=Yes|POS=X, Number=Plur|POS=PRON|PronType=Rcp, Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs, Case=Gen|Degree=Cmp|POS=ADJ, Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN, Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs, Gender=Neut|Number=Sing|POS=PRON|PronType=Dem, Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form, Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form, Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs, Case=Gen|Number=Plur|POS=PRON|PronType=Rcp, POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs, POS=SYM, POS=DET|PronType=Dem, Gender=Com|Number=Sing|POS=NUM, Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs, Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part, Definite=Def|Degree=Abs|POS=ADJ, POS=VERB|Tense=Pres, Definite=Ind|Gender=Neut|Number=Sing|POS=NUM, Degree=Abs|POS=ADV, Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ, Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel, POS=VERB|Tense=Past|VerbForm=Part, Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ, Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs, Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs, Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs, Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs, Definite=Ind|POS=NOUN, Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind, Definite=Ind|Gender=Com|Number=Sing|POS=NUM, Definite=Def|Number=Plur|POS=NOUN, Case=Gen|POS=NOUN, POS=AUX|Tense=Pres|VerbForm=Part
parser ROOT, acl:relcl, advcl, advmod, advmod:lmod, amod, appos, aux, case, cc, ccomp, compound:prt, conj, cop, dep, det, expl, fixed, flat, iobj, list, mark, nmod, nmod:poss, nsubj, nummod, obj, obl, obl:lmod, obl:tmod, punct, xcomp
ner LOC, MISC, ORG, PER

Accuracy

Type Score
TOKEN_ACC 99.92
TOKEN_P 99.70
TOKEN_R 99.77
TOKEN_F 99.74
SENTS_P 100.00
SENTS_R 100.00
SENTS_F 100.00
TAG_ACC 99.14
POS_ACC 99.08
MORPH_ACC 98.80
MORPH_MICRO_P 99.45
MORPH_MICRO_R 99.32
MORPH_MICRO_F 99.39
DEP_UAS 92.81
DEP_LAS 90.80
ENTS_P 88.58
ENTS_R 86.20
ENTS_F 87.38
LEMMA_ACC 95.89
COREF_LEA_F1 46.72
COREF_LEA_PRECISION 45.91
COREF_LEA_RECALL 47.56
NEL_SCORE 34.29
NEL_MICRO_P 84.00
NEL_MICRO_R 21.54
NEL_MICRO_F 34.29
NEL_MACRO_P 86.71
NEL_MACRO_R 24.70
NEL_MACRO_F 37.28

Training

This model was trained using spaCy and logged to Weights & Biases. You can find all the training logs here.