File size: 1,542 Bytes
dc2510c 6cddc41 b247ff3 b44e3a3 b247ff3 dc2510c b247ff3 6cddc41 b247ff3 6cddc41 b44e3a3 6cddc41 b44e3a3 6cddc41 b44e3a3 6cddc41 b44e3a3 6cddc41 b247ff3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
license: apache-2.0
language: zh
tags:
- Token Classification
metrics:
- precision
- recall
- f1
- accuracy
---
## Model description
This model is a fine-tuned version of macbert for the purpose of spell checking in medical application scenarios. We fine-tuned macbert Chinese base version on a 300M dataset including 60K+ authorized medical articles. We proposed to randomly confuse 30% sentences of these articles by adding noise with a either visually or phonologically resembled characters. Consequently, the fine-tuned model can achieve 96% accuracy on our test dataset.
## Intended uses & limitations
You can use this model directly with a pipeline for token classification:
```python
>>> from transformers import (AutoModelForTokenClassification, AutoTokenizer)
>>> from transformers import pipeline
>>> hub_model_id = "9pinus/macbert-base-chinese-medical-collation"
>>> model = AutoModelForTokenClassification.from_pretrained(hub_model_id)
>>> tokenizer = AutoTokenizer.from_pretrained(hub_model_id)
>>> classifier = pipeline('ner', model=model, tokenizer=tokenizer)
>>> result = classifier("ε¦ζη
ζ
θΎιοΌε―ιε½ε£ζη²θεηγη―ι
―ηΊ’ιη΄ ηηθ―η©θΏθ‘ζζζιηγ")
>>> for item in result:
>>> if item['entity'] == 1:
>>> print(item)
{'entity': 1, 'score': 0.58127016, 'index': 14, 'word': 'θ', 'start': 13, 'end': 14}
```
### Framework versions
- Transformers 4.15.0
- Pytorch 1.10.1+cu113
- Datasets 1.17.0
- Tokenizers 0.10.3
|