med_tokenizer_wakati / token_fraction_describe.txt
parasora's picture
Upload 5 files
f078a57 verified
raw
history blame contribute delete
768 Bytes
token text token/text
count 25350.000000 25350.000000 25350.000000
mean 116.634162 234.688008 0.511913
std 127.626086 252.295032 0.093063
min 12.000000 41.000000 0.269231
10% 26.000000 50.000000 0.422222
25% 36.000000 66.000000 0.450172
33% 41.000000 77.000000 0.462040
50% 73.000000 151.000000 0.489130
67% 130.000000 268.000000 0.523070
75% 164.000000 334.000000 0.546373
80% 189.000000 387.000000 0.565789
90% 267.000000 539.000000 0.642857
95% 340.550000 682.000000 0.722222
99% 500.000000 995.510000 0.792785
max 10133.000000 18800.000000 1.023810