9pinus
/

macbert-base-chinese-medical-collation

@@ -3,78 +3,105 @@ license: apache-2.0
 language: en
 tags:
 - generated_from_trainer
 metrics:
 - precision
 - recall
 - f1
 - accuracy
-model-index:
-- name: macbert-finetuned-tokenclassification-errorword
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# macbert-finetuned-tokenclassification-errorword
-This model is a fine-tuned version of [shibing624/macbert4csc-base-chinese](https://huggingface.co/shibing624/macbert4csc-base-chinese) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0040
-- Precision: 0.0
-- Recall: 0.0
-- F1: 0.0
-- Accuracy: 0.9994
 ## Model description
-This model fine-tuned on a large corpus of medical material which processed on purpose, we propose to sample words and use similar words to do replacement for masking purpose.
-As a result, this model can performed pretty well when applying on medical relatted downstream tasks.
 ## Intended uses & limitations
 You can use this model directly with a pipeline for token classification:
 ```python
-from transformers import (AutoModelForTokenClassification, AutoTokenizer
-from transformers import pipeline
-hub_model_id = "9pinus/macbert-base-chinese-medical-collation"
-model = AutoModelForTokenClassification.from_pretrained(hub_model_id)
-tokenizer = BertTokenizer.from_pretrained(hub_model_id)
-classifier = pipeline('ner', model=model, tokenizer=tokenizer)
-result = classifier("如果病情较重，可适当口服甲硝唑片、环酯红霉素片、吲哚美辛片等药物进行抗感染镇痛。同时在日常生活中要注意牙齿清洁卫生，养成刷牙的好习惯。")
-for item in result:
-    print(item)
 ```
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 16
-- eval_batch_size: 16
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- num_epochs: 8.0
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step   | Validation Loss | Precision | Recall | F1  | Accuracy |
-|:-------------:|:-----:|:------:|:---------------:|:---------:|:------:|:---:|:--------:|
-| 0.0038        | 1.0   | 36875  | 0.0030          | 0.0       | 0.0    | 0.0 | 0.9991   |
-| 0.0026        | 2.0   | 73750  | 0.0028          | 0.0       | 0.0    | 0.0 | 0.9992   |
-| 0.0021        | 3.0   | 110625 | 0.0033          | 0.0       | 0.0    | 0.0 | 0.9992   |
-| 0.0014        | 4.0   | 147500 | 0.0033          | 0.0       | 0.0    | 0.0 | 0.9993   |
-| 0.0009        | 5.0   | 184375 | 0.0033          | 0.0       | 0.0    | 0.0 | 0.9993   |
-| 0.0006        | 6.0   | 221250 | 0.0035          | 0.0       | 0.0    | 0.0 | 0.9994   |
-| 0.0004        | 7.0   | 258125 | 0.0037          | 0.0       | 0.0    | 0.0 | 0.9994   |
-| 0.0002        | 8.0   | 295000 | 0.0040          | 0.0       | 0.0    | 0.0 | 0.9994   |
 ### Framework versions
 - Transformers 4.15.0

 language: en
 tags:
 - generated_from_trainer
+- Token Classification
 metrics:
 - precision
 - recall
 - f1
 - accuracy
 ---
 ## Model description
+This model is a fine-tuned version of macbert for the purpose of spell checking in medical apllication scenarious, and we fine-tuned on our own medical data which accumulated during past several years including 600,000 fine edited medical articals. When processing the dataset, we proposed to sample 30% of these articals then randomly select characters and replace these words with spelling errors which are either visally or phonologically resembled characters. Consequently, the model can achieve 90% accuracy on our test dataset.
 ## Intended uses & limitations
 You can use this model directly with a pipeline for token classification:
 ```python
+>>> from transformers import (AutoModelForTokenClassification, BertTokenizer)
+>>> from transformers import pipeline
+>>> hub_model_id = "9pinus/macbert-base-chinese-medical-collation"
+>>> model = AutoModelForTokenClassification.from_pretrained(hub_model_id)
+>>> tokenizer = BertTokenizer.from_pretrained(hub_model_id)
+>>> classifier = pipeline('ner', model=model, tokenizer=tokenizer)
+>>> result = classifier("如果病情较重，可适当口服甲肖唑片、环酯红霉素片、吲哚美辛片等药物进行抗感染镇痛。同时在日常生活中要注意牙齿清洁卫生，养成刷牙的好习惯。")
+>>> for item in result:
+>>>     print(item)
+{'entity': 0, 'score': 0.9999982, 'index': 1, 'word': '如', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 2, 'word': '果', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 3, 'word': '病', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 4, 'word': '情', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 5, 'word': '较', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 6, 'word': '重', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999833, 'index': 7, 'word': '，', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 8, 'word': '可', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 9, 'word': '适', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 10, 'word': '当', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 11, 'word': '口', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 12, 'word': '服', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.9999982, 'index': 13, 'word': '甲', 'start': None, 'end': None}
+{'entity': 1, 'score': 0.901703, 'index': 14, 'word': '肖', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999833, 'index': 15, 'word': '唑', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 16, 'word': '片', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 17, 'word': '、', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 18, 'word': '环', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 19, 'word': '酯', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 20, 'word': '红', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 21, 'word': '霉', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 22, 'word': '素', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 23, 'word': '片', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 24, 'word': '、', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 25, 'word': '吲', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999833, 'index': 26, 'word': '哚', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.999998, 'index': 27, 'word': '美', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999833, 'index': 28, 'word': '辛', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 29, 'word': '片', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999833, 'index': 30, 'word': '等', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 31, 'word': '药', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 32, 'word': '物', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999833, 'index': 33, 'word': '进', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 34, 'word': '行', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 35, 'word': '抗', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 36, 'word': '感', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 37, 'word': '染', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 38, 'word': '镇', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 39, 'word': '痛', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999833, 'index': 40, 'word': '。', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 41, 'word': '同', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 42, 'word': '时', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999833, 'index': 43, 'word': '在', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 44, 'word': '日', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 45, 'word': '常', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 46, 'word': '生', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 47, 'word': '活', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 48, 'word': '中', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 49, 'word': '要', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 50, 'word': '注', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 51, 'word': '意', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 52, 'word': '牙', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 53, 'word': '齿', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 54, 'word': '清', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 55, 'word': '洁', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 56, 'word': '卫', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 57, 'word': '生', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 58, 'word': '，', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 59, 'word': '养', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 60, 'word': '成', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 61, 'word': '刷', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 62, 'word': '牙', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 63, 'word': '的', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 64, 'word': '好', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999845, 'index': 65, 'word': '习', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999857, 'index': 66, 'word': '惯', 'start': None, 'end': None}
+{'entity': 0, 'score': 0.99999833, 'index': 67, 'word': '。', 'start': None, 'end': None}
 ```
 ### Framework versions
 - Transformers 4.15.0