SandLogicTechnologies
/

Granite-3.1-8b-instruct-GGUF

+---
+language:
+- en
+- de
+- es
+base_model:
+- ibm-granite/granite-3.1-8b-instruct
+---
+## SandLogic Technology Quantized Granite-3.1-8B-Instruct-GGUF
+This repository contains Q4_KM and Q5_KM quantized versions of the [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model. These quantized variants maintain the core capabilities of the original model while significantly reducing the memory footprint and increasing inference speed.
+Discover our full range of quantized language models by visiting our [SandLogic Lexicon](https://github.com/sandlogic/SandLogic-Lexicon) GitHub. To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com).
+## Model Details
+- **Original Model**: Granite-3.1-8B-Instruct
+- **Quantized Versions**:
+  - Q4_KM (4-bit quantization)
+  - Q5_KM (5-bit quantization)
+- **Base Architecture**: 8B parameter long-context instruct model
+- **Developer**: Granite Team, IBM
+- **License**: Apache 2.0
+- **Release Date**: December 18th, 2024
+## Quantization Benefits
+### Q4_KM Version
+- Reduced model size: ~4GB (75% smaller than original)
+- Faster inference speed
+- Minimal quality degradation
+- Optimal for resource-constrained environments
+### Q5_KM Version
+- Reduced model size: ~5GB (69% smaller than original)
+- Better quality preservation compared to Q4
+- Balanced trade-off between model size and performance
+- Recommended for quality-sensitive applications
+## Supported Languages
+The quantized models maintain support for all original languages:
+- English
+- German
+- Spanish
+- French
+- Japanese
+- Portuguese
+- Arabic
+- Czech
+- Italian
+- Korean
+- Dutch
+- Chinese
+Users can fine-tune these quantized models for additional languages.
+## Capabilities
+Both quantized versions preserve the original model's capabilities:
+- Summarization
+- Text classification
+- Text extraction
+- Question-answering
+- Retrieval Augmented Generation (RAG)
+- Code related tasks
+- Function-calling tasks
+- Multilingual dialog use cases
+- Long-context tasks including document/meeting summarization and QA
+## Usage
+```python
+from llama_cpp import Llama
+llm = Llama(
+    model_path="models/granite-3.1-8b-instruct-Q4_K_M.gguf",
+    verbose=False,
+    # n_gpu_layers=-1, # Uncomment to use GPU acceleration
+    # n_ctx=2048, # Uncomment to increase the context window
+)
+output = llm.create_chat_completion(
+    messages =[
+    {
+        "role": "system",
+        "content": "You are an AI Assistant"
+        ,
+    },
+    {"role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location."},
+]
+)
+print(output["choices"][0]['message']['content'])
+```
+## Intended Use
+These quantized models are designed for:
+- Resource-constrained environments
+- Edge deployment scenarios
+- Applications requiring faster inference
+- Building AI assistants for multiple domains
+- Business applications with limited computational resources
+## Training Information
+The base model was trained on:
+1. Publicly available datasets with permissive license
+2. Internal synthetic data targeting specific capabilities
+3. Small amounts of human-curated data
+Detailed attribution can be found in the upcoming Granite 3.1 Technical Report.
+## Acknowledgements
+We thank Meta for developing the original IBM Granite model and the creators of the bigbio/med_qa dataset.
+Special thanks to Georgi Gerganov and the entire llama.cpp development team for their outstanding contributions.
+## Contact
+For any inquiries or support, please contact us at support@sandlogic.com or visit our [support page](https://www.sandlogic.com/contact-us/).
+## Explore More
+For any inquiries or support, please contact us at support@sandlogic.com or visit our [support page](https://www.sandlogic.com/contact-us/).