pere commited on
Commit
475050a
1 Parent(s): 2a842d5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  ---
8
 
9
  # NB-Llama-3.2-3B-Q4_K_M-GGUF
10
- This model is a **quantized** version of the original [NB-Llama-3.2-3B](https://huggingface.co/north/nb-llama-3.2-3B), converted into the **GGUF format** using [llama.cpp](https://github.com/ggerganov/llama.cpp). Quantization significantly reduces the model's memory footprint, enabling efficient inference on a wide range of hardware, including personal devices, without compromising too much quality. These quantized models are mainly provided so that people can test out the models with moderate hardware. If you want to benchmark the models or further finetune the models, we strongly recommend the non-quantized versions.
11
 
12
  ## What is `llama.cpp`?
13
  [`llama.cpp`](https://github.com/ggerganov/llama.cpp) is a versatile tool for running large language models optimized for efficiency. It supports multiple quantization formats (e.g., GGML and GGUF) and provides inference capabilities on diverse hardware, including CPUs, GPUs, and mobile devices. The GGUF format is the latest evolution, designed to enhance compatibility and performance.
@@ -31,18 +31,18 @@ To use this quantized model with `llama.cpp`, follow the steps below:
31
 
32
  #### CLI:
33
  ```bash
34
- llama-cli --hf-repo north/nb-llama-3.2-3B-Q4_K_M-GGUF --hf-file nb-llama-3.2-3b-q4_k_m.gguf -p "Your prompt here"
35
  ```
36
 
37
  #### Server:
38
  ```bash
39
- llama-server --hf-repo north/nb-llama-3.2-3B-Q4_K_M-GGUF --hf-file nb-llama-3.2-3b-q4_k_m.gguf -c 2048
40
  ```
41
 
42
  For more information, refer to the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).
43
 
44
  ## Additional Resources
45
- - [Original Model Card](https://huggingface.co/north/nb-llama-3.2-3B)
46
  - [llama.cpp Repository](https://github.com/ggerganov/llama.cpp)
47
  - [GGUF Format Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/llama)
48
 
 
7
  ---
8
 
9
  # NB-Llama-3.2-3B-Q4_K_M-GGUF
10
+ This model is a **quantized** version of the original [NB-Llama-3.2-3B](https://huggingface.co/NbAiLab/nb-llama-3.2-3B), converted into the **GGUF format** using [llama.cpp](https://github.com/ggerganov/llama.cpp). Quantization significantly reduces the model's memory footprint, enabling efficient inference on a wide range of hardware, including personal devices, without compromising too much quality. These quantized models are mainly provided so that people can test out the models with moderate hardware. If you want to benchmark the models or further finetune the models, we strongly recommend the non-quantized versions.
11
 
12
  ## What is `llama.cpp`?
13
  [`llama.cpp`](https://github.com/ggerganov/llama.cpp) is a versatile tool for running large language models optimized for efficiency. It supports multiple quantization formats (e.g., GGML and GGUF) and provides inference capabilities on diverse hardware, including CPUs, GPUs, and mobile devices. The GGUF format is the latest evolution, designed to enhance compatibility and performance.
 
31
 
32
  #### CLI:
33
  ```bash
34
+ llama-cli --hf-repo NbAiLab/nb-llama-3.2-3B-Q4_K_M-GGUF --hf-file nb-llama-3.2-3b-q4_k_m.gguf -p "Your prompt here"
35
  ```
36
 
37
  #### Server:
38
  ```bash
39
+ llama-server --hf-repo NbAiLab/nb-llama-3.2-3B-Q4_K_M-GGUF --hf-file nb-llama-3.2-3b-q4_k_m.gguf -c 2048
40
  ```
41
 
42
  For more information, refer to the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).
43
 
44
  ## Additional Resources
45
+ - [Original Model Card](https://huggingface.co/NbAiLab/nb-llama-3.2-3B)
46
  - [llama.cpp Repository](https://github.com/ggerganov/llama.cpp)
47
  - [GGUF Format Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/llama)
48