|
--- |
|
license: creativeml-openrail-m |
|
library_name: transformers |
|
tags: |
|
- deep_think |
|
- reasoning |
|
- chain_of_thought |
|
- chain_of_thinking |
|
- prev_2 |
|
- self_reasoning |
|
- llama-cpp |
|
- gguf-my-repo |
|
language: |
|
- en |
|
base_model: prithivMLmods/Llama-Thinker-3B-Preview2 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Triangle104/Llama-Thinker-3B-Preview2-Q5_K_S-GGUF |
|
This model was converted to GGUF format from [`prithivMLmods/Llama-Thinker-3B-Preview2`](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) for more details on the model. |
|
|
|
--- |
|
Model details: |
|
- |
|
Llama-Thinker-3B-Preview2 is a pretrained and instruction-tuned |
|
generative model designed for multilingual applications. These models |
|
are trained using synthetic datasets based on long chains of thought, |
|
enabling them to perform complex reasoning tasks effectively. |
|
|
|
|
|
Model Architecture: [ Based on Llama 3.2 ] is an autoregressive |
|
language model that uses an optimized transformer architecture. The |
|
tuned versions undergo supervised fine-tuning (SFT) and reinforcement |
|
learning with human feedback (RLHF) to align with human preferences for |
|
helpfulness and safety. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use with transformers |
|
|
|
|
|
|
|
|
|
Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function. |
|
|
|
|
|
Make sure to update your transformers installation via pip install --upgrade transformers. |
|
|
|
|
|
import torch |
|
from transformers import pipeline |
|
|
|
model_id = "prithivMLmods/Llama-Thinker-3B-Preview2" |
|
pipe = pipeline( |
|
"text-generation", |
|
model=model_id, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
) |
|
messages = [ |
|
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, |
|
{"role": "user", "content": "Who are you?"}, |
|
] |
|
outputs = pipe( |
|
messages, |
|
max_new_tokens=256, |
|
) |
|
print(outputs[0]["generated_text"][-1]) |
|
|
|
|
|
|
|
Note: You can also find detailed recipes on how to use the model locally, with torch.compile(), assisted generations, quantised and more at huggingface-llama-recipes |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use with llama |
|
|
|
|
|
|
|
|
|
Please, follow the instructions in the repository |
|
|
|
|
|
To download Original checkpoints, see the example command below leveraging huggingface-cli: |
|
|
|
|
|
huggingface-cli download prithivMLmods/Llama-Thinker-3B-Preview2 --include "original/*" --local-dir Llama-Thinker-3B-Preview2 |
|
|
|
|
|
|
|
Hereβs a version tailored for the Llama-Thinker-3B-Preview2-GGUF model: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
How to Run Llama-Thinker-3B-Preview2 on Ollama Locally |
|
|
|
|
|
|
|
|
|
This guide demonstrates how to run the Llama-Thinker-3B-Preview2-GGUF |
|
model locally using Ollama. The model is instruction-tuned for |
|
multilingual tasks and complex reasoning, making it highly versatile for |
|
a wide range of use cases. By the end, you'll be equipped to run this |
|
and other open-source models with ease. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example 1: How to Run the Llama-Thinker-3B-Preview2 Model |
|
|
|
|
|
|
|
|
|
The Llama-Thinker-3B-Preview2 model is a pretrained |
|
and instruction-tuned LLM, designed for complex reasoning tasks across |
|
multiple languages. In this guide, we'll interact with it locally using |
|
Ollama, with support for quantized models. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Step 1: Download the Model |
|
|
|
|
|
|
|
|
|
First, download the Llama-Thinker-3B-Preview2-GGUF model using the following command: |
|
|
|
|
|
ollama run llama-thinker-3b-preview2.gguf |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Step 2: Model Initialization and Download |
|
|
|
|
|
|
|
|
|
Once the command is executed, Ollama will initialize and download the |
|
necessary model files. You should see output similar to this: |
|
|
|
|
|
pulling manifest |
|
pulling a12cd3456efg... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 3.2 GB |
|
pulling 9f87ghijklmn... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 6.5 KB |
|
verifying sha256 digest |
|
writing manifest |
|
removing any unused layers |
|
success |
|
>>> Send a message (/? for help) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Step 3: Interact with the Model |
|
|
|
|
|
|
|
|
|
Once the model is fully loaded, you can interact with it by sending prompts. For example, let's ask: |
|
|
|
|
|
>>> How can you assist me today? |
|
|
|
|
|
|
|
A sample response might look like this [may / maynot be identical]: |
|
|
|
|
|
I am Llama-Thinker-3B-Preview2, an advanced AI language model designed to assist with complex reasoning, multilingual tasks, and general-purpose queries. Here are a few things I can help you with: |
|
|
|
1. Answering complex questions in multiple languages. |
|
2. Assisting with creative writing, content generation, and problem-solving. |
|
3. Providing detailed summaries and explanations. |
|
4. Translating text across different languages. |
|
5. Generating ideas for personal or professional use. |
|
6. Offering insights on technical topics. |
|
|
|
Feel free to ask me anything you'd like assistance with! |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Step 4: Exit the Program |
|
|
|
|
|
|
|
|
|
To exit the program, simply type: |
|
|
|
|
|
/exit |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example 2: Using Multi-Modal Models (Future Use) |
|
|
|
|
|
|
|
|
|
In the future, Ollama may support multi-modal models where you can |
|
input both text and images for advanced interactions. This section will |
|
be updated as new capabilities become available. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Notes on Using Quantized Models |
|
|
|
|
|
|
|
|
|
Quantized models like llama-thinker-3b-preview2.gguf |
|
are optimized for efficient performance on local systems with limited |
|
resources. Here are some key points to ensure smooth operation: |
|
|
|
|
|
VRAM/CPU Requirements: Ensure your system has adequate VRAM or CPU resources to handle model inference. |
|
Model Format: Use the .gguf model format for compatibility with Ollama. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Conclusion |
|
|
|
|
|
|
|
|
|
Running the Llama-Thinker-3B-Preview2 model locally |
|
using Ollama provides a powerful way to leverage open-source LLMs for |
|
complex reasoning and multilingual tasks. By following this guide, you |
|
can explore other models and expand your use cases as new models become |
|
available. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Llama-Thinker-3B-Preview2-Q5_K_S-GGUF --hf-file llama-thinker-3b-preview2-q5_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Llama-Thinker-3B-Preview2-Q5_K_S-GGUF --hf-file llama-thinker-3b-preview2-q5_k_s.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Llama-Thinker-3B-Preview2-Q5_K_S-GGUF --hf-file llama-thinker-3b-preview2-q5_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Llama-Thinker-3B-Preview2-Q5_K_S-GGUF --hf-file llama-thinker-3b-preview2-q5_k_s.gguf -c 2048 |
|
``` |
|
|