juntaoyuan
commited on
Commit
•
23de599
1
Parent(s):
2439f29
Update README.md
Browse files
README.md
CHANGED
@@ -51,63 +51,62 @@ curl -LO https://github.com/second-state/llama-utils/raw/main/simple/llama-simpl
|
|
51 |
curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
|
52 |
```
|
53 |
|
54 |
-
## Use the
|
55 |
|
56 |
|
57 |
-
The
|
58 |
|
59 |
Chat with the 7b chat model
|
60 |
|
61 |
```
|
62 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
63 |
```
|
64 |
|
65 |
Generate text with the 7b base model
|
66 |
|
67 |
```
|
68 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
69 |
```
|
70 |
|
71 |
Chat with the 13b chat model
|
72 |
|
73 |
```
|
74 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
75 |
```
|
76 |
|
77 |
Generate text with the 13b base model
|
78 |
|
79 |
```
|
80 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
81 |
```
|
82 |
|
83 |
-
|
84 |
-
## Use the quantized models
|
85 |
|
86 |
|
87 |
-
The
|
88 |
|
89 |
Chat with the 7b chat model
|
90 |
|
91 |
```
|
92 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
93 |
```
|
94 |
|
95 |
Generate text with the 7b base model
|
96 |
|
97 |
```
|
98 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
99 |
```
|
100 |
|
101 |
Chat with the 13b chat model
|
102 |
|
103 |
```
|
104 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
105 |
```
|
106 |
|
107 |
Generate text with the 13b base model
|
108 |
|
109 |
```
|
110 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
111 |
```
|
112 |
|
113 |
## Resource constrained models
|
@@ -118,23 +117,23 @@ The `q2_k` version is the smallest quantized version of the llama2 models. They
|
|
118 |
Chat with the 7b chat model
|
119 |
|
120 |
```
|
121 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
122 |
```
|
123 |
|
124 |
Generate text with the 7b base model
|
125 |
|
126 |
```
|
127 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
128 |
```
|
129 |
|
130 |
Chat with the 13b chat model
|
131 |
|
132 |
```
|
133 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
134 |
```
|
135 |
|
136 |
Generate text with the 13b base model
|
137 |
|
138 |
```
|
139 |
-
wasmedge --dir .:. --nn-preload default:GGML:
|
140 |
```
|
|
|
51 |
curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
|
52 |
```
|
53 |
|
54 |
+
## Use the quantized models
|
55 |
|
56 |
|
57 |
+
The `q5_k_m` version is a quantized version of the llama2 models. They are only half of the size of the original models, and hence consume half as much VRAM, but still give high-quality inference results.
|
58 |
|
59 |
Chat with the 7b chat model
|
60 |
|
61 |
```
|
62 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
|
63 |
```
|
64 |
|
65 |
Generate text with the 7b base model
|
66 |
|
67 |
```
|
68 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
69 |
```
|
70 |
|
71 |
Chat with the 13b chat model
|
72 |
|
73 |
```
|
74 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-q5_k_m.gguf llama-chat.wasm
|
75 |
```
|
76 |
|
77 |
Generate text with the 13b base model
|
78 |
|
79 |
```
|
80 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
81 |
```
|
82 |
|
83 |
+
## Use the f16 models
|
|
|
84 |
|
85 |
|
86 |
+
The f16 version is the GGUF equivalent of the original llama2 models. It gives the best quality inference results but also consumes the most computing resources in both VRAM and computing time. The f16 models are also great as a basis for fine-tuning.
|
87 |
|
88 |
Chat with the 7b chat model
|
89 |
|
90 |
```
|
91 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-f16.gguf llama-chat.wasm
|
92 |
```
|
93 |
|
94 |
Generate text with the 7b base model
|
95 |
|
96 |
```
|
97 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
98 |
```
|
99 |
|
100 |
Chat with the 13b chat model
|
101 |
|
102 |
```
|
103 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-f16.gguf llama-chat.wasm
|
104 |
```
|
105 |
|
106 |
Generate text with the 13b base model
|
107 |
|
108 |
```
|
109 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
110 |
```
|
111 |
|
112 |
## Resource constrained models
|
|
|
117 |
Chat with the 7b chat model
|
118 |
|
119 |
```
|
120 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q2_k.gguf llama-chat.wasm
|
121 |
```
|
122 |
|
123 |
Generate text with the 7b base model
|
124 |
|
125 |
```
|
126 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
127 |
```
|
128 |
|
129 |
Chat with the 13b chat model
|
130 |
|
131 |
```
|
132 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-q2_k.gguf llama-chat.wasm
|
133 |
```
|
134 |
|
135 |
Generate text with the 13b base model
|
136 |
|
137 |
```
|
138 |
+
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
139 |
```
|