elyza
/

Llama-3-ELYZA-JP-8B-GGUF

Inference Endpoints

Model card Files Files and versions Community

passaglia commited on Jun 25

Commit

bad0264

•

1 Parent(s): a32eb2a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -88,7 +88,7 @@ There are various desktop applications that can handle GGUF models, but here we
 - **Downloading the Model**: Search for `elyza/Llama-3-ELYZA-JP-8B-GGUF` in the search bar on the home page 🏠, and download `Llama-3-ELYZA-JP-8B-q4_k_m.gguf`.
 - **Start Chatting**: Click on 💬 in the sidebar, select `Llama-3-ELYZA-JP-8B-GGUF` from "Select a Model to load" in the header, and load the model. You can now freely chat with the local LLM.
 - **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload to Max in the GPU Settings.
-- **（For Developers） Starting an API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
 ## Developers

 - **Downloading the Model**: Search for `elyza/Llama-3-ELYZA-JP-8B-GGUF` in the search bar on the home page 🏠, and download `Llama-3-ELYZA-JP-8B-q4_k_m.gguf`.
 - **Start Chatting**: Click on 💬 in the sidebar, select `Llama-3-ELYZA-JP-8B-GGUF` from "Select a Model to load" in the header, and load the model. You can now freely chat with the local LLM.
 - **Setting Options**: You can set options from the sidebar on the right. Faster inference can be achieved by setting Quick GPU Offload to Max in the GPU Settings.
+- **(For Developers) Starting an API Server**: Click `<->` in the left sidebar and move to the Local Server tab. Select the model and click Start Server to launch an OpenAI API-compatible API server.
 ## Developers