README.md · dahara1/gemma-2-27b-it-gguf-japanese-imatrix at de2d26eff0520e6d853c632cf2179fac29331db4

metadata

tags:
  - gemma
  - llm

gemma-2-27b-itを日本語が多く含まれる重要度行列(iMatrix)を使って量子化したgguf版です。日本語対応能力が多めに保持されている事を期待していますが確かめる事はまだ出来ていません
This is a quantized gguf version of gemma-2-27b-it using an importance matrix (iMatrix) that contains many Japanese words. I hope it retains more Japanese support, but I can't be sure yet.

使い方(How to use)

公式マニュアルに従ってllama.cppをビルドします
Build llama.cpp according to the official manual

ダウンロードしたモデルを指定して下記コマンドを実行します

llama.cpp\build\bin\Release\llama-server -m .\gemma-2-27b-it-Q4_K_M.gguf

ブラウザでhttp://127.0.0.1:8080を開きます
Open http://127.0.0.1:8080 in your browser

その他の疑問など Other questions etc.

Q4_K_Mをwiki.test.raw(英語)を使って計測したperplexityスコアが他の同等GGUF量子化モデルに比べて優れている事は確認済ですが理由はまだわかりません。
I have already confirmed that the perplexity score of Q4_K_M measured using wiki.test.raw is superior to other equivalent GGUF quantization models, but I don't know why yet.

解明されていない疑問はあります
There are unanswered questions.

llama.cppの不具合対応がほぼ完了した後に作成したからperplexityが低くなったのか？
(Was the perplexity low because it was created after the llama.cpp defects were almost completed?)
iMatrixは量子化強度が高いモデルでなければ効果があまりないという説もあるが多言語の観点からもそれは正しいのか？
(Some say that iMatrix is not very effective unless the model has high quantization strength, but is that true from a multilingual point of view)
wiki.test.raw(英語)でperplexityを計測することにどこまで意味があるのか？
(How far does it make sense to measure perplexity with wiki.test.raw (English)?)

Window 11のCMD/PowerShellでは日本語が化けてしまう問題がある事を確認しています。 We have confirmed a problem that Japanese is garbled in CMD/PowerShell on Window 11.