albert_chinese_base
This a albert_chinese_base model from Google's github
converted by huggingface's script
Notice
Support AutoTokenizer
Since sentencepiece is not used in albert_chinese_base model
you have to call BertTokenizer instead of AlbertTokenizer !!!
we can eval it using an example on MaskedLM
由於 albert_chinese_base 模型沒有用 sentencepiece
用AlbertTokenizer會載不進詞表,因此需要改用BertTokenizer !!!
我們可以跑MaskedLM預測來驗證這個做法是否正確
Justify (驗證有效性)
from transformers import AutoTokenizer, AlbertForMaskedLM
import torch
from torch.nn.functional import softmax
pretrained = 'voidful/albert_chinese_base'
tokenizer = AutoTokenizer.from_pretrained(pretrained)
model = AlbertForMaskedLM.from_pretrained(pretrained)
inputtext = "今天[MASK]情很好"
maskpos = tokenizer.encode(inputtext, add_special_tokens=True).index(103)
input_ids = torch.tensor(tokenizer.encode(inputtext, add_special_tokens=True)).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=input_ids)
loss, prediction_scores = outputs[:2]
logit_prob = softmax(prediction_scores[0, maskpos],dim=-1).data.tolist()
predicted_index = torch.argmax(prediction_scores[0, maskpos]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
print(predicted_token, logit_prob[predicted_index])
Result: 感 0.36333346366882324
- Downloads last month
- 1,076
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.