Update README.md

Browse files

Files changed (1) hide show

README.md +261 -0

README.md CHANGED Viewed

@@ -20,6 +20,267 @@ pipeline_tag: text-generation
 This model was converted to GGUF format from [`prithivMLmods/Llama-Thinker-3B-Preview2`](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`prithivMLmods/Llama-Thinker-3B-Preview2`](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) for more details on the model.
+---
+Model details:
+-
+Llama-Thinker-3B-Preview2 is a pretrained and instruction-tuned
+generative model designed for multilingual applications. These models
+are trained using synthetic datasets based on long chains of thought,
+enabling them to perform complex reasoning tasks effectively.
+Model Architecture: [ Based on Llama 3.2 ] is an autoregressive
+language model that uses an optimized transformer architecture. The
+tuned versions undergo supervised fine-tuning (SFT) and reinforcement
+learning with human feedback (RLHF) to align with human preferences for
+helpfulness and safety.
+		Use with transformers
+Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
+Make sure to update your transformers installation via pip install --upgrade transformers.
+import torch
+from transformers import pipeline
+model_id = "prithivMLmods/Llama-Thinker-3B-Preview2"
+pipe = pipeline(
+    "text-generation",
+    model=model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
+    {"role": "user", "content": "Who are you?"},
+]
+outputs = pipe(
+    messages,
+    max_new_tokens=256,
+)
+print(outputs[0]["generated_text"][-1])
+Note: You can also find detailed recipes on how to use the model locally, with torch.compile(), assisted generations, quantised and more at huggingface-llama-recipes
+		Use with llama
+Please, follow the instructions in the repository
+To download Original checkpoints, see the example command below leveraging huggingface-cli:
+huggingface-cli download prithivMLmods/Llama-Thinker-3B-Preview2 --include "original/*" --local-dir Llama-Thinker-3B-Preview2
+Here’s a version tailored for the Llama-Thinker-3B-Preview2-GGUF model:
+		How to Run Llama-Thinker-3B-Preview2 on Ollama Locally
+This guide demonstrates how to run the Llama-Thinker-3B-Preview2-GGUF
+ model locally using Ollama. The model is instruction-tuned for
+multilingual tasks and complex reasoning, making it highly versatile for
+ a wide range of use cases. By the end, you'll be equipped to run this
+and other open-source models with ease.
+		Example 1: How to Run the Llama-Thinker-3B-Preview2 Model
+The Llama-Thinker-3B-Preview2 model is a pretrained
+and instruction-tuned LLM, designed for complex reasoning tasks across
+multiple languages. In this guide, we'll interact with it locally using
+Ollama, with support for quantized models.
+		Step 1: Download the Model
+First, download the Llama-Thinker-3B-Preview2-GGUF model using the following command:
+ollama run llama-thinker-3b-preview2.gguf
+		Step 2: Model Initialization and Download
+Once the command is executed, Ollama will initialize and download the
+ necessary model files. You should see output similar to this:
+pulling manifest
+pulling a12cd3456efg... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 3.2 GB
+pulling 9f87ghijklmn... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 6.5 KB
+verifying sha256 digest
+writing manifest
+removing any unused layers
+success
+>>> Send a message (/? for help)
+		Step 3: Interact with the Model
+Once the model is fully loaded, you can interact with it by sending prompts. For example, let's ask:
+>>> How can you assist me today?
+A sample response might look like this [may / maynot be identical]:
+I am Llama-Thinker-3B-Preview2, an advanced AI language model designed to assist with complex reasoning, multilingual tasks, and general-purpose queries. Here are a few things I can help you with:
+1. Answering complex questions in multiple languages.
+2. Assisting with creative writing, content generation, and problem-solving.
+3. Providing detailed summaries and explanations.
+4. Translating text across different languages.
+5. Generating ideas for personal or professional use.
+6. Offering insights on technical topics.
+Feel free to ask me anything you'd like assistance with!
+		Step 4: Exit the Program
+To exit the program, simply type:
+/exit
+		Example 2: Using Multi-Modal Models (Future Use)
+In the future, Ollama may support multi-modal models where you can
+input both text and images for advanced interactions. This section will
+be updated as new capabilities become available.
+		Notes on Using Quantized Models
+Quantized models like llama-thinker-3b-preview2.gguf
+ are optimized for efficient performance on local systems with limited
+resources. Here are some key points to ensure smooth operation:
+VRAM/CPU Requirements: Ensure your system has adequate VRAM or CPU resources to handle model inference.
+Model Format: Use the .gguf model format for compatibility with Ollama.
+		Conclusion
+Running the Llama-Thinker-3B-Preview2 model locally
+using Ollama provides a powerful way to leverage open-source LLMs for
+complex reasoning and multilingual tasks. By following this guide, you
+can explore other models and expand your use cases as new models become
+available.
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)