Update README.md
Browse files
README.md
CHANGED
@@ -27,104 +27,90 @@ model-index:
|
|
27 |
name: f1 macro
|
28 |
args:
|
29 |
average: macro
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
---
|
31 |
|
32 |
-
#
|
33 |
|
34 |
-
[
|
|
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
-
|
41 |
|
42 |
-
|
43 |
|
44 |
-
|
45 |
|
46 |
-
|
47 |
-
|
|
|
48 |
|
49 |
-
|
50 |
-
|------------------------------------------------------|----------------------------------------------------------------|
|
51 |
-
| memory | |
|
52 |
-
| steps | [('feature_extraction', ColumnTransformer(transformers=[('abbreviations',<br /> <__main__.ELFAbbreviationTransformer object at 0x7f38e082e4f0>,<br /> 0),<br /> ('tokenizer',<br /> CountVectorizer(binary=True, lowercase=False,<br /> tokenizer=<__main__.LegalEntityTokenizer object at 0x7f38e082ee50>),<br /> 0)])), ('classifier', ComplementNB())] |
|
53 |
-
| verbose | False |
|
54 |
-
| feature_extraction | ColumnTransformer(transformers=[('abbreviations',<br /> <__main__.ELFAbbreviationTransformer object at 0x7f38e082e4f0>,<br /> 0),<br /> ('tokenizer',<br /> CountVectorizer(binary=True, lowercase=False,<br /> tokenizer=<__main__.LegalEntityTokenizer object at 0x7f38e082ee50>),<br /> 0)]) |
|
55 |
-
| classifier | ComplementNB() |
|
56 |
-
| feature_extraction__n_jobs | |
|
57 |
-
| feature_extraction__remainder | drop |
|
58 |
-
| feature_extraction__sparse_threshold | 0.3 |
|
59 |
-
| feature_extraction__transformer_weights | |
|
60 |
-
| feature_extraction__transformers | [('abbreviations', <__main__.ELFAbbreviationTransformer object at 0x7f38e082e4f0>, 0), ('tokenizer', CountVectorizer(binary=True, lowercase=False,<br /> tokenizer=<__main__.LegalEntityTokenizer object at 0x7f38e082ee50>), 0)] |
|
61 |
-
| feature_extraction__verbose | False |
|
62 |
-
| feature_extraction__verbose_feature_names_out | True |
|
63 |
-
| feature_extraction__abbreviations | <__main__.ELFAbbreviationTransformer object at 0x7f38e082e4f0> |
|
64 |
-
| feature_extraction__tokenizer | CountVectorizer(binary=True, lowercase=False,<br /> tokenizer=<__main__.LegalEntityTokenizer object at 0x7f38e082ee50>) |
|
65 |
-
| feature_extraction__abbreviations__elf_abbreviations | <__main__.ELFAbbreviations object at 0x7f38f438b670> |
|
66 |
-
| feature_extraction__abbreviations__jurisdiction | PL |
|
67 |
-
| feature_extraction__abbreviations__use_endswith | True |
|
68 |
-
| feature_extraction__abbreviations__use_lowercasing | True |
|
69 |
-
| feature_extraction__tokenizer__analyzer | word |
|
70 |
-
| feature_extraction__tokenizer__binary | True |
|
71 |
-
| feature_extraction__tokenizer__decode_error | strict |
|
72 |
-
| feature_extraction__tokenizer__dtype | <class 'numpy.int64'> |
|
73 |
-
| feature_extraction__tokenizer__encoding | utf-8 |
|
74 |
-
| feature_extraction__tokenizer__input | content |
|
75 |
-
| feature_extraction__tokenizer__lowercase | False |
|
76 |
-
| feature_extraction__tokenizer__max_df | 1.0 |
|
77 |
-
| feature_extraction__tokenizer__max_features | |
|
78 |
-
| feature_extraction__tokenizer__min_df | 1 |
|
79 |
-
| feature_extraction__tokenizer__ngram_range | (1, 1) |
|
80 |
-
| feature_extraction__tokenizer__preprocessor | |
|
81 |
-
| feature_extraction__tokenizer__stop_words | |
|
82 |
-
| feature_extraction__tokenizer__strip_accents | |
|
83 |
-
| feature_extraction__tokenizer__token_pattern | (?u)\b\w\w+\b |
|
84 |
-
| feature_extraction__tokenizer__tokenizer | <__main__.LegalEntityTokenizer object at 0x7f38e082ee50> |
|
85 |
-
| feature_extraction__tokenizer__vocabulary | |
|
86 |
-
| classifier__alpha | 1.0 |
|
87 |
-
| classifier__class_prior | |
|
88 |
-
| classifier__fit_prior | True |
|
89 |
-
| classifier__norm | False |
|
90 |
|
91 |
-
|
|
|
|
|
|
|
92 |
|
93 |
-
|
94 |
|
95 |
-
|
96 |
|
97 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
98 |
|
99 |
-
## Evaluation Results
|
100 |
|
101 |
-
|
102 |
|
103 |
-
|
104 |
-
|
105 |
-
| f1 | 0.971647 |
|
106 |
-
| f1 macro | 0.522164 |
|
107 |
|
108 |
-
#
|
109 |
|
110 |
-
|
111 |
-
|
112 |
-
# Model Card Authors
|
113 |
-
|
114 |
-
This model card is written by following authors:
|
115 |
-
|
116 |
-
[More Information Needed]
|
117 |
-
|
118 |
-
# Model Card Contact
|
119 |
-
|
120 |
-
You can contact the model card authors through following channels:
|
121 |
-
[More Information Needed]
|
122 |
-
|
123 |
-
# Citation
|
124 |
-
|
125 |
-
Below you can find information related to citation.
|
126 |
-
|
127 |
-
**BibTeX:**
|
128 |
-
```
|
129 |
-
[More Information Needed]
|
130 |
-
```
|
|
|
27 |
name: f1 macro
|
28 |
args:
|
29 |
average: macro
|
30 |
+
widget:
|
31 |
+
- text: "INSTYTUT DIABETOLOGII SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ"
|
32 |
+
- text: '"METAL-SYSTEM" OGRODZENIA - SCHODY SŁAWOMIR BINKOWSKI'
|
33 |
+
- text: "GERLACH S.A."
|
34 |
+
- text: "EMU SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ SPÓŁKA KOMANDYTOWA"
|
35 |
+
- text: "JEREMIE SEED CAPITAL WOJEWÓDZTWA POMORSKIEGO FUNDUSZ INWESTYCYJNY ZAMKNIĘTY W LIKWIDACJI"
|
36 |
+
- text: "MIASTO BIELSKO-BIAŁA"
|
37 |
+
- text: 'MARKETING" KRYSTIAN GDOWKA, ARTUR OSTRĘGA SPÓŁKA JAWNA'
|
38 |
+
- text: "Bank Spółdzielczy w Poddębicach"
|
39 |
+
- text: 'Fundacja Dzieciom "POMAGAJ"'
|
40 |
+
- text: "KANCELARIA RADCÓW PRAWNYCH BRUDKIEWICZ, SUCHECKA SPÓŁKA KOMANDYTOWO-AKCYJNA"
|
41 |
+
- text: "AKADEMIA MARYNARKI WOJENNEJ IM. BOHATERÓW WESTERPLATTE"
|
42 |
+
- text: "ZGROMADZENIE SIÓSTR URSZULANEK UNII RZYMSKIEJ DOM ZAKONNY"
|
43 |
+
- text: "STOWARZYSZENIE AUTORÓW ZAIKS"
|
44 |
+
- text: "SKAT TRANSPORT PROSTA SPÓŁKA AKCYJNA"
|
45 |
+
- text: "Nationale-Nederlanden Dobrowolny Fundusz Emerytalny Nasze Jutro 2055"
|
46 |
+
- text: "STORY HOUSE EGMONT SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ"
|
47 |
+
- text: "Narodowy Fundusz Ochrony Środowiska i Gospodarki Wodnej"
|
48 |
+
- text: 'ORGANIZACJA ZAKŁADOWA NSZZ "SOLIDARNOŚĆ" NR 3395 W T-MOBILE POLSKA S.A.'
|
49 |
+
- text: "CI GAMES SPÓŁKA EUROPEJSKA"
|
50 |
+
- text: "PPK Pocztylion 2040 Dobrowolny Fundusz Emerytalny"
|
51 |
+
- text: "TOWARZYSTWO UBEZPIECZEŃ WZAJEMNYCH POLSKI ZAKŁAD UBEZPIECZEŃ WZAJEMNYCH"
|
52 |
+
- text: "KABANEK JANINA POTORSKA ROBERT POTORSKI"
|
53 |
+
- text: "SPÓŁDZIELCZA KASA OSZCZĘDNOŚCIOWO-KREDYTOWA ENERGIA"
|
54 |
+
- text: "SZOSTEK_BAR I PARTNERZY KANCELARIA PRAWNA"
|
55 |
+
- text: "MIEJSKI ZARZĄD BUDYNKÓW MIESZKALNYCH"
|
56 |
+
- text: "IZBA ADWOKACKA W KATOWICACH"
|
57 |
+
- text: '1. Niepubliczny Specjalistyczny Zakład Opieki Zdrowotnej "LUNG" Krzysztof Garbino 2. Drukarnia "GARBINO"'
|
58 |
---
|
59 |
|
60 |
+
# LENU - Legal Entity Name Understanding for Poland
|
61 |
|
62 |
+
A Polish Bert (uncased) model fine-tuned on Polish legal entity names (jurisdiction PL) from the Global [Legal Entity Identifier](https://www.gleif.org/en/about-lei/introducing-the-legal-entity-identifier-lei)
|
63 |
+
(LEI) System with the goal to detect [Entity Legal Form (ELF) Codes](https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list).
|
64 |
|
65 |
+
---------------
|
66 |
|
67 |
+
<h1 align="center">
|
68 |
+
<a href="https://gleif.org">
|
69 |
+
<img src="http://sdglabs.ai/wp-content/uploads/2022/07/gleif-logo-new.png" width="220px" style="display: inherit">
|
70 |
+
</a>
|
71 |
+
</h1><br>
|
72 |
+
<h3 align="center">in collaboration with</h3>
|
73 |
+
<h1 align="center">
|
74 |
+
<a href="https://sociovestix.com">
|
75 |
+
<img src="https://sociovestix.com/img/svl_logo_centered.svg" width="700px" style="width: 100%">
|
76 |
+
</a>
|
77 |
+
</h1><br>
|
78 |
|
79 |
+
---------------
|
80 |
|
81 |
+
## Model Description
|
82 |
|
83 |
+
<!-- Provide a longer summary of what this model is. -->
|
84 |
|
85 |
+
The model has been created as part of a collaboration of the [Global Legal Entity Identifier Foundation](https://gleif.org) (GLEIF) and
|
86 |
+
[Sociovestix Labs](https://sociovestix.com) with the goal to explore how Machine Learning can support in detecting the ELF Code solely based on an entity's legal name and legal jurisdiction.
|
87 |
+
See also the open source python library [lenu](https://github.com/Sociovestix/lenu), which supports in this task.
|
88 |
|
89 |
+
The model has been trained on the dataset [lenu](https://huggingface.co/datasets/Sociovestix), with a focus on polish legal entities and ELF Codes within the Jurisdiction "PL".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
+
- **Developed by:** [GLEIF](https://gleif.org) and [Sociovestix Labs](https://huggingface.co/Sociovestix)
|
92 |
+
- **License:** Creative Commons (CC0) license
|
93 |
+
- **Finetuned from model [optional]:** dkleczek/bert-base-polish-uncased-v1
|
94 |
+
- **Resources for more information:** [Press Release](https://www.gleif.org/en/newsroom/press-releases/machine-learning-new-open-source-tool-developed-by-gleif-and-sociovestix-labs-enables-organizations-everywhere-to-automatically-)
|
95 |
|
96 |
+
# Uses
|
97 |
|
98 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
99 |
|
100 |
+
An entity's legal form is a crucial component when verifying and screening organizational identity.
|
101 |
+
The wide variety of entity legal forms that exist within and between jurisdictions, however, has made it difficult for large organizations to capture legal form as structured data.
|
102 |
+
The Jurisdiction specific models of [lenu](https://github.com/Sociovestix/lenu), trained on entities from
|
103 |
+
GLEIF’s Legal Entity Identifier (LEI) database of over two million records, will allow banks,
|
104 |
+
investment firms, corporations, governments, and other large organizations to retrospectively analyze
|
105 |
+
their master data, extract the legal form from the unstructured text of the legal name and
|
106 |
+
uniformly apply an ELF code to each entity type, according to the ISO 20275 standard.
|
107 |
|
|
|
108 |
|
109 |
+
# Licensing Information
|
110 |
|
111 |
+
This model, which is trained on LEI data, is available under Creative Commons (CC0) license.
|
112 |
+
See [gleif.org/en/about/open-data](https://gleif.org/en/about/open-data).
|
|
|
|
|
113 |
|
114 |
+
# Recommendations
|
115 |
|
116 |
+
Users should always consider the score of the suggested ELF Codes. For low score values it may be necessary to manually review the affected entities.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|