Triangle104
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,267 @@ pipeline_tag: text-generation
|
|
20 |
This model was converted to GGUF format from [`prithivMLmods/Llama-Thinker-3B-Preview2`](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
21 |
Refer to the [original model card](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) for more details on the model.
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
## Use with llama.cpp
|
24 |
Install llama.cpp through brew (works on Mac and Linux)
|
25 |
|
|
|
20 |
This model was converted to GGUF format from [`prithivMLmods/Llama-Thinker-3B-Preview2`](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
21 |
Refer to the [original model card](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) for more details on the model.
|
22 |
|
23 |
+
---
|
24 |
+
Model details:
|
25 |
+
-
|
26 |
+
Llama-Thinker-3B-Preview2 is a pretrained and instruction-tuned
|
27 |
+
generative model designed for multilingual applications. These models
|
28 |
+
are trained using synthetic datasets based on long chains of thought,
|
29 |
+
enabling them to perform complex reasoning tasks effectively.
|
30 |
+
|
31 |
+
|
32 |
+
Model Architecture: [ Based on Llama 3.2 ] is an autoregressive
|
33 |
+
language model that uses an optimized transformer architecture. The
|
34 |
+
tuned versions undergo supervised fine-tuning (SFT) and reinforcement
|
35 |
+
learning with human feedback (RLHF) to align with human preferences for
|
36 |
+
helpfulness and safety.
|
37 |
+
|
38 |
+
|
39 |
+
|
40 |
+
|
41 |
+
|
42 |
+
|
43 |
+
|
44 |
+
Use with transformers
|
45 |
+
|
46 |
+
|
47 |
+
|
48 |
+
|
49 |
+
Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
|
50 |
+
|
51 |
+
|
52 |
+
Make sure to update your transformers installation via pip install --upgrade transformers.
|
53 |
+
|
54 |
+
|
55 |
+
import torch
|
56 |
+
from transformers import pipeline
|
57 |
+
|
58 |
+
model_id = "prithivMLmods/Llama-Thinker-3B-Preview2"
|
59 |
+
pipe = pipeline(
|
60 |
+
"text-generation",
|
61 |
+
model=model_id,
|
62 |
+
torch_dtype=torch.bfloat16,
|
63 |
+
device_map="auto",
|
64 |
+
)
|
65 |
+
messages = [
|
66 |
+
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
|
67 |
+
{"role": "user", "content": "Who are you?"},
|
68 |
+
]
|
69 |
+
outputs = pipe(
|
70 |
+
messages,
|
71 |
+
max_new_tokens=256,
|
72 |
+
)
|
73 |
+
print(outputs[0]["generated_text"][-1])
|
74 |
+
|
75 |
+
|
76 |
+
|
77 |
+
Note: You can also find detailed recipes on how to use the model locally, with torch.compile(), assisted generations, quantised and more at huggingface-llama-recipes
|
78 |
+
|
79 |
+
|
80 |
+
|
81 |
+
|
82 |
+
|
83 |
+
|
84 |
+
|
85 |
+
Use with llama
|
86 |
+
|
87 |
+
|
88 |
+
|
89 |
+
|
90 |
+
Please, follow the instructions in the repository
|
91 |
+
|
92 |
+
|
93 |
+
To download Original checkpoints, see the example command below leveraging huggingface-cli:
|
94 |
+
|
95 |
+
|
96 |
+
huggingface-cli download prithivMLmods/Llama-Thinker-3B-Preview2 --include "original/*" --local-dir Llama-Thinker-3B-Preview2
|
97 |
+
|
98 |
+
|
99 |
+
|
100 |
+
Hereβs a version tailored for the Llama-Thinker-3B-Preview2-GGUF model:
|
101 |
+
|
102 |
+
|
103 |
+
|
104 |
+
|
105 |
+
|
106 |
+
|
107 |
+
|
108 |
+
|
109 |
+
How to Run Llama-Thinker-3B-Preview2 on Ollama Locally
|
110 |
+
|
111 |
+
|
112 |
+
|
113 |
+
|
114 |
+
This guide demonstrates how to run the Llama-Thinker-3B-Preview2-GGUF
|
115 |
+
model locally using Ollama. The model is instruction-tuned for
|
116 |
+
multilingual tasks and complex reasoning, making it highly versatile for
|
117 |
+
a wide range of use cases. By the end, you'll be equipped to run this
|
118 |
+
and other open-source models with ease.
|
119 |
+
|
120 |
+
|
121 |
+
|
122 |
+
|
123 |
+
|
124 |
+
|
125 |
+
|
126 |
+
|
127 |
+
Example 1: How to Run the Llama-Thinker-3B-Preview2 Model
|
128 |
+
|
129 |
+
|
130 |
+
|
131 |
+
|
132 |
+
The Llama-Thinker-3B-Preview2 model is a pretrained
|
133 |
+
and instruction-tuned LLM, designed for complex reasoning tasks across
|
134 |
+
multiple languages. In this guide, we'll interact with it locally using
|
135 |
+
Ollama, with support for quantized models.
|
136 |
+
|
137 |
+
|
138 |
+
|
139 |
+
|
140 |
+
|
141 |
+
|
142 |
+
|
143 |
+
Step 1: Download the Model
|
144 |
+
|
145 |
+
|
146 |
+
|
147 |
+
|
148 |
+
First, download the Llama-Thinker-3B-Preview2-GGUF model using the following command:
|
149 |
+
|
150 |
+
|
151 |
+
ollama run llama-thinker-3b-preview2.gguf
|
152 |
+
|
153 |
+
|
154 |
+
|
155 |
+
|
156 |
+
|
157 |
+
|
158 |
+
|
159 |
+
|
160 |
+
Step 2: Model Initialization and Download
|
161 |
+
|
162 |
+
|
163 |
+
|
164 |
+
|
165 |
+
Once the command is executed, Ollama will initialize and download the
|
166 |
+
necessary model files. You should see output similar to this:
|
167 |
+
|
168 |
+
|
169 |
+
pulling manifest
|
170 |
+
pulling a12cd3456efg... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 3.2 GB
|
171 |
+
pulling 9f87ghijklmn... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 6.5 KB
|
172 |
+
verifying sha256 digest
|
173 |
+
writing manifest
|
174 |
+
removing any unused layers
|
175 |
+
success
|
176 |
+
>>> Send a message (/? for help)
|
177 |
+
|
178 |
+
|
179 |
+
|
180 |
+
|
181 |
+
|
182 |
+
|
183 |
+
|
184 |
+
|
185 |
+
Step 3: Interact with the Model
|
186 |
+
|
187 |
+
|
188 |
+
|
189 |
+
|
190 |
+
Once the model is fully loaded, you can interact with it by sending prompts. For example, let's ask:
|
191 |
+
|
192 |
+
|
193 |
+
>>> How can you assist me today?
|
194 |
+
|
195 |
+
|
196 |
+
|
197 |
+
A sample response might look like this [may / maynot be identical]:
|
198 |
+
|
199 |
+
|
200 |
+
I am Llama-Thinker-3B-Preview2, an advanced AI language model designed to assist with complex reasoning, multilingual tasks, and general-purpose queries. Here are a few things I can help you with:
|
201 |
+
|
202 |
+
1. Answering complex questions in multiple languages.
|
203 |
+
2. Assisting with creative writing, content generation, and problem-solving.
|
204 |
+
3. Providing detailed summaries and explanations.
|
205 |
+
4. Translating text across different languages.
|
206 |
+
5. Generating ideas for personal or professional use.
|
207 |
+
6. Offering insights on technical topics.
|
208 |
+
|
209 |
+
Feel free to ask me anything you'd like assistance with!
|
210 |
+
|
211 |
+
|
212 |
+
|
213 |
+
|
214 |
+
|
215 |
+
|
216 |
+
|
217 |
+
|
218 |
+
Step 4: Exit the Program
|
219 |
+
|
220 |
+
|
221 |
+
|
222 |
+
|
223 |
+
To exit the program, simply type:
|
224 |
+
|
225 |
+
|
226 |
+
/exit
|
227 |
+
|
228 |
+
|
229 |
+
|
230 |
+
|
231 |
+
|
232 |
+
|
233 |
+
|
234 |
+
|
235 |
+
|
236 |
+
Example 2: Using Multi-Modal Models (Future Use)
|
237 |
+
|
238 |
+
|
239 |
+
|
240 |
+
|
241 |
+
In the future, Ollama may support multi-modal models where you can
|
242 |
+
input both text and images for advanced interactions. This section will
|
243 |
+
be updated as new capabilities become available.
|
244 |
+
|
245 |
+
|
246 |
+
|
247 |
+
|
248 |
+
|
249 |
+
|
250 |
+
|
251 |
+
|
252 |
+
Notes on Using Quantized Models
|
253 |
+
|
254 |
+
|
255 |
+
|
256 |
+
|
257 |
+
Quantized models like llama-thinker-3b-preview2.gguf
|
258 |
+
are optimized for efficient performance on local systems with limited
|
259 |
+
resources. Here are some key points to ensure smooth operation:
|
260 |
+
|
261 |
+
|
262 |
+
VRAM/CPU Requirements: Ensure your system has adequate VRAM or CPU resources to handle model inference.
|
263 |
+
Model Format: Use the .gguf model format for compatibility with Ollama.
|
264 |
+
|
265 |
+
|
266 |
+
|
267 |
+
|
268 |
+
|
269 |
+
|
270 |
+
|
271 |
+
|
272 |
+
Conclusion
|
273 |
+
|
274 |
+
|
275 |
+
|
276 |
+
|
277 |
+
Running the Llama-Thinker-3B-Preview2 model locally
|
278 |
+
using Ollama provides a powerful way to leverage open-source LLMs for
|
279 |
+
complex reasoning and multilingual tasks. By following this guide, you
|
280 |
+
can explore other models and expand your use cases as new models become
|
281 |
+
available.
|
282 |
+
|
283 |
+
---
|
284 |
## Use with llama.cpp
|
285 |
Install llama.cpp through brew (works on Mac and Linux)
|
286 |
|