hajime9652 commited on
Commit
8511f3d
1 Parent(s): 522dddd

Update readme

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -19,11 +19,15 @@ metrics:
19
  This model require Mecab and senetencepiece with XLNetTokenizer.
20
  See details https://qiita.com/mkt3/items/4d0ae36f3f212aee8002
21
 
 
 
 
 
 
22
  #### How to use
23
 
24
  ```python
25
- import MeCab
26
- import subprocess
27
 
28
  from transformers import (
29
  pipeline,
@@ -33,11 +37,7 @@ from transformers import (
33
 
34
  class XLNet():
35
  def __init__(self):
36
- cmd = 'echo `mecab-config --dicdir`"/mecab-ipadic-neologd"'
37
- path = (subprocess.Popen(cmd, stdout=subprocess.PIPE,
38
- shell=True).communicate()[0]).decode('utf-8')
39
- self.m = MeCab.Tagger(f"-Owakati -d {path}")
40
-
41
  self.gen_model = XLNetLMHeadModel.from_pretrained("hajime9652/xlnet-japanese")
42
  self.gen_tokenizer = XLNetTokenizer.from_pretrained("hajime9652/xlnet-japanese")
43
 
 
19
  This model require Mecab and senetencepiece with XLNetTokenizer.
20
  See details https://qiita.com/mkt3/items/4d0ae36f3f212aee8002
21
 
22
+ This model uses NFKD as the normalization method for character encoding.
23
+ Japanese muddle marks and semi-muddle marks will be lost.
24
+
25
+ *日本語の濁点・半濁点がないモデルです*
26
+
27
  #### How to use
28
 
29
  ```python
30
+ from fugashi import Tagger
 
31
 
32
  from transformers import (
33
  pipeline,
 
37
 
38
  class XLNet():
39
  def __init__(self):
40
+ self.m = Tagger('-Owakati')
 
 
 
 
41
  self.gen_model = XLNetLMHeadModel.from_pretrained("hajime9652/xlnet-japanese")
42
  self.gen_tokenizer = XLNetTokenizer.from_pretrained("hajime9652/xlnet-japanese")
43