|
--- |
|
language: |
|
- en |
|
- de |
|
- es |
|
base_model: |
|
- ibm-granite/granite-3.1-8b-instruct |
|
--- |
|
## SandLogic Technology Quantized Granite-3.1-8B-Instruct-GGUF |
|
|
|
This repository contains Q4_KM and Q5_KM quantized versions of the [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model. These quantized variants maintain the core capabilities of the original model while significantly reducing the memory footprint and increasing inference speed. |
|
|
|
Discover our full range of quantized language models by visiting our [SandLogic Lexicon](https://github.com/sandlogic/SandLogic-Lexicon) GitHub. To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com). |
|
|
|
## Model Details |
|
|
|
- **Original Model**: Granite-3.1-8B-Instruct |
|
- **Quantized Versions**: |
|
- Q4_KM (4-bit quantization) |
|
- Q5_KM (5-bit quantization) |
|
- **Base Architecture**: 8B parameter long-context instruct model |
|
- **Developer**: Granite Team, IBM |
|
- **License**: Apache 2.0 |
|
- **Release Date**: December 18th, 2024 |
|
|
|
## Quantization Benefits |
|
|
|
### Q4_KM Version |
|
- Reduced model size: ~4GB (75% smaller than original) |
|
- Faster inference speed |
|
- Minimal quality degradation |
|
- Optimal for resource-constrained environments |
|
|
|
### Q5_KM Version |
|
- Reduced model size: ~5GB (69% smaller than original) |
|
- Better quality preservation compared to Q4 |
|
- Balanced trade-off between model size and performance |
|
- Recommended for quality-sensitive applications |
|
|
|
## Supported Languages |
|
|
|
The quantized models maintain support for all original languages: |
|
- English |
|
- German |
|
- Spanish |
|
- French |
|
- Japanese |
|
- Portuguese |
|
- Arabic |
|
- Czech |
|
- Italian |
|
- Korean |
|
- Dutch |
|
- Chinese |
|
|
|
Users can fine-tune these quantized models for additional languages. |
|
|
|
## Capabilities |
|
|
|
Both quantized versions preserve the original model's capabilities: |
|
- Summarization |
|
- Text classification |
|
- Text extraction |
|
- Question-answering |
|
- Retrieval Augmented Generation (RAG) |
|
- Code related tasks |
|
- Function-calling tasks |
|
- Multilingual dialog use cases |
|
- Long-context tasks including document/meeting summarization and QA |
|
|
|
## Usage |
|
|
|
```python |
|
from llama_cpp import Llama |
|
|
|
llm = Llama( |
|
model_path="models/granite-3.1-8b-instruct-Q4_K_M.gguf", |
|
verbose=False, |
|
# n_gpu_layers=-1, # Uncomment to use GPU acceleration |
|
# n_ctx=2048, # Uncomment to increase the context window |
|
) |
|
|
|
output = llm.create_chat_completion( |
|
messages =[ |
|
{ |
|
"role": "system", |
|
"content": "You are an AI Assistant" |
|
, |
|
}, |
|
{"role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location."}, |
|
] |
|
) |
|
|
|
print(output["choices"][0]['message']['content']) |
|
|
|
``` |
|
|
|
|
|
|
|
## Intended Use |
|
|
|
These quantized models are designed for: |
|
- Resource-constrained environments |
|
- Edge deployment scenarios |
|
- Applications requiring faster inference |
|
- Building AI assistants for multiple domains |
|
- Business applications with limited computational resources |
|
|
|
## Training Information |
|
|
|
The base model was trained on: |
|
1. Publicly available datasets with permissive license |
|
2. Internal synthetic data targeting specific capabilities |
|
3. Small amounts of human-curated data |
|
|
|
Detailed attribution can be found in the upcoming Granite 3.1 Technical Report. |
|
|
|
|
|
## Acknowledgements |
|
|
|
We thank Meta for developing the original IBM Granite model and the creators of the bigbio/med_qa dataset. |
|
Special thanks to Georgi Gerganov and the entire llama.cpp development team for their outstanding contributions. |
|
## Contact |
|
|
|
For any inquiries or support, please contact us at support@sandlogic.com or visit our [support page](https://www.sandlogic.com/contact-us/). |
|
|
|
## Explore More |
|
|
|
For any inquiries or support, please contact us at support@sandlogic.com or visit our [support page](https://www.sandlogic.com/contact-us/). |
|
|
|
|
|
|