passaglia commited on
Commit
b617fe5
1 Parent(s): d6ac9a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -10
README.md CHANGED
@@ -20,9 +20,10 @@ Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama
20
  For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
21
 
22
  ## Quantization
23
- We performed quantization using [llama.cpp](https://github.com/ggerganov/llama.cpp) and converted the model to GGUF format. Currently, we only offer quantized models in the Q4_K_M format.
24
 
25
- We have prepared two quantized model options, GGUF and AWQ. Here is the table measuring the performance degradation due to quantization.
 
 
26
 
27
  | Model | ELYZA-tasks-100 GPT4 score |
28
  | :-------------------------------- | ---: |
@@ -37,6 +38,7 @@ Install llama.cpp through brew (works on Mac and Linux)
37
  ```bash
38
  brew install llama.cpp
39
  ```
 
40
  Invoke the llama.cpp server.
41
 
42
  ```bash
@@ -83,17 +85,13 @@ completion = client.chat.completions.create(
83
 
84
  ## Use with Desktop App
85
 
86
- There are various desktop applications that can handle GGUF models, but here we will introduce how to use a model in a local environment without coding by using LM Studio.
87
 
88
  - **Installation**: Download and install [LM Studio](https://lmstudio.ai/).
89
  - **Downloading the Model**: Search for `elyza/Llama-3-ELYZA-JP-8B-GGUF` in the search bar on the home page 🏠, and download `Llama-3-ELYZA-JP-8B-q4_k_m.gguf`.
90
- - **Start Chatting**: Click on 💬 in the sidebar, select `Llama-3-ELYZA-JP-8B-GGUF` from "Select a Model to load" in the header, and load the model. Now you can freely chat with the local LLM.
91
- - **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload Settings to Max in the GPU Settings.
92
- - **For Developers, Starting the API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
93
-
94
- ## Quantization Options
95
-
96
- Currently, we only offer quantized models in the Q4_K_M format.
97
 
98
  ## Developers
99
 
 
20
  For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
21
 
22
  ## Quantization
 
23
 
24
+ We have prepared two quantized model options, GGUF and AWQ. This is the GGUF (Q4_K_M) model, converted using [llama.cpp](https://github.com/ggerganov/llama.cpp).
25
+
26
+ Here is a table showing the performance degradation due to quantization.
27
 
28
  | Model | ELYZA-tasks-100 GPT4 score |
29
  | :-------------------------------- | ---: |
 
38
  ```bash
39
  brew install llama.cpp
40
  ```
41
+
42
  Invoke the llama.cpp server.
43
 
44
  ```bash
 
85
 
86
  ## Use with Desktop App
87
 
88
+ There are various desktop applications that can handle GGUF models, but here we will introduce how to use the model in the no-code environment LM Studio.
89
 
90
  - **Installation**: Download and install [LM Studio](https://lmstudio.ai/).
91
  - **Downloading the Model**: Search for `elyza/Llama-3-ELYZA-JP-8B-GGUF` in the search bar on the home page 🏠, and download `Llama-3-ELYZA-JP-8B-q4_k_m.gguf`.
92
+ - **Start Chatting**: Click on 💬 in the sidebar, select `Llama-3-ELYZA-JP-8B-GGUF` from "Select a Model to load" in the header, and load the model. You can now freely chat with the local LLM.
93
+ - **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload to Max in the GPU Settings.
94
+ - **For Developers: Starting the API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
 
 
 
 
95
 
96
  ## Developers
97