Files changed (1) hide show
  1. README.md +98 -0
README.md CHANGED
@@ -87,6 +87,104 @@ model = AutoModelForTokenClassification.from_pretrained("xlm-roberta-large-finet
87
  classifier = pipeline("ner", model=model, tokenizer=tokenizer)
88
  classifier("Hello I'm Omar and I live in Zürich.")
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  [{'end': 14,
91
  'entity': 'I-PER',
92
  'index': 5,
 
87
  classifier = pipeline("ner", model=model, tokenizer=tokenizer)
88
  classifier("Hello I'm Omar and I live in Zürich.")
89
 
90
+ [{'end': 14,
91
+ 'entity': 'I-PER',
92
+ 'index': 5,
93
+ 'score': 0.9999175,
94
+ 'start': 10,
95
+ 'word': '▁Omar'},
96
+ {'end': 35,
97
+ 'entity': 'I-LOC',
98
+ 'index': 10,
99
+ 'score': 0.9999906,
100
+ 'start': 29,
101
+ 'word': '▁Zürich'}]
102
+ from transformers import pipeline
103
+ tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large-finetuned-conll03-english")
104
+ model = AutoModelForTokenClassification.from_pretrained("xlm-roberta-large-finetuned-conll03-english")
105
+ classifier = pipeline("ner", model=model, tokenizer=tokenizer)
106
+ classifier("Alya told Jasmine that Andrew could pay with cash..")
107
+ [{'end': 2,
108
+ 'entity': 'I-PER',
109
+ 'index': 1,
110
+ 'score': 0.9997861,
111
+ 'start': 0,
112
+ 'word': '▁Al'},
113
+ {'end': 4,
114
+ 'entity': 'I-PER',
115
+ 'index': 2,
116
+ 'score': 0.9998591,
117
+ 'start': 2,
118
+ 'word': 'ya'},
119
+ {'end': 16,
120
+ 'entity': 'I-PER',
121
+ 'index': 4,
122
+ 'score': 0.99995816,
123
+ 'start': 10,
124
+ 'word': '▁Jasmin'},
125
+ {'end': 17,
126
+ 'entity': 'I-PER',
127
+ 'index': 5,
128
+ 'score': 0.9999584,
129
+ 'start': 16,
130
+ 'word': 'e'},
131
+ {'end': 29,
132
+ 'entity': 'I-PER',
133
+ 'index': 7,
134
+ 'score': 0.99998057,
135
+ 'start': 23,
136
+ 'word': '▁Andrew'}]
137
+
138
+ Recommendations
139
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
140
+
141
+ Training
142
+ See the following resources for training data and training procedure details:
143
+
144
+ XLM-RoBERTa-large model card
145
+ CoNLL-2003 data card
146
+ Associated paper
147
+ Evaluation
148
+ See the associated paper for evaluation details.
149
+
150
+ Environmental Impact
151
+ Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
152
+
153
+ Hardware Type: 500 32GB Nvidia V100 GPUs (from the associated paper)
154
+ Hours used: More information needed
155
+ Cloud Provider: More information needed
156
+ Compute Region: More information needed
157
+ Carbon Emitted: More information needed
158
+ Technical Specifications
159
+ See the associated paper for further details.
160
+
161
+ Citation
162
+ BibTeX:
163
+
164
+ @article{conneau2019unsupervised,
165
+ title={Unsupervised Cross-lingual Representation Learning at Scale},
166
+ author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
167
+ journal={arXiv preprint arXiv:1911.02116},
168
+ year={2019}
169
+ }
170
+
171
+ APA:
172
+
173
+ Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
174
+ Model Card Authors
175
+ This model card was written by the team at Hugging Face.
176
+
177
+ How to Get Started with the Model
178
+ Use the code below to get started with the model. You can use this model directly within a pipeline for NER.
179
+
180
+ Click to expand
181
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
182
+ from transformers import pipeline
183
+ tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large-finetuned-conll03-english")
184
+ model = AutoModelForTokenClassification.from_pretrained("xlm-roberta-large-finetuned-conll03-english")
185
+ classifier = pipeline("ner", model=model, tokenizer=tokenizer)
186
+ classifier("Hello I'm Omar and I live in Zürich.")
187
+
188
  [{'end': 14,
189
  'entity': 'I-PER',
190
  'index': 5,