SandLogicTechnologies commited on
Commit
7461a1b
·
verified ·
1 Parent(s): b34f917

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -0
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - de
5
+ - es
6
+ base_model:
7
+ - ibm-granite/granite-3.1-8b-instruct
8
+ ---
9
+ ## SandLogic Technology Quantized Granite-3.1-8B-Instruct-GGUF
10
+
11
+ This repository contains Q4_KM and Q5_KM quantized versions of the [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model. These quantized variants maintain the core capabilities of the original model while significantly reducing the memory footprint and increasing inference speed.
12
+
13
+ Discover our full range of quantized language models by visiting our [SandLogic Lexicon](https://github.com/sandlogic/SandLogic-Lexicon) GitHub. To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com).
14
+
15
+ ## Model Details
16
+
17
+ - **Original Model**: Granite-3.1-8B-Instruct
18
+ - **Quantized Versions**:
19
+ - Q4_KM (4-bit quantization)
20
+ - Q5_KM (5-bit quantization)
21
+ - **Base Architecture**: 8B parameter long-context instruct model
22
+ - **Developer**: Granite Team, IBM
23
+ - **License**: Apache 2.0
24
+ - **Release Date**: December 18th, 2024
25
+
26
+ ## Quantization Benefits
27
+
28
+ ### Q4_KM Version
29
+ - Reduced model size: ~4GB (75% smaller than original)
30
+ - Faster inference speed
31
+ - Minimal quality degradation
32
+ - Optimal for resource-constrained environments
33
+
34
+ ### Q5_KM Version
35
+ - Reduced model size: ~5GB (69% smaller than original)
36
+ - Better quality preservation compared to Q4
37
+ - Balanced trade-off between model size and performance
38
+ - Recommended for quality-sensitive applications
39
+
40
+ ## Supported Languages
41
+
42
+ The quantized models maintain support for all original languages:
43
+ - English
44
+ - German
45
+ - Spanish
46
+ - French
47
+ - Japanese
48
+ - Portuguese
49
+ - Arabic
50
+ - Czech
51
+ - Italian
52
+ - Korean
53
+ - Dutch
54
+ - Chinese
55
+
56
+ Users can fine-tune these quantized models for additional languages.
57
+
58
+ ## Capabilities
59
+
60
+ Both quantized versions preserve the original model's capabilities:
61
+ - Summarization
62
+ - Text classification
63
+ - Text extraction
64
+ - Question-answering
65
+ - Retrieval Augmented Generation (RAG)
66
+ - Code related tasks
67
+ - Function-calling tasks
68
+ - Multilingual dialog use cases
69
+ - Long-context tasks including document/meeting summarization and QA
70
+
71
+ ## Usage
72
+
73
+ ```python
74
+ from llama_cpp import Llama
75
+
76
+ llm = Llama(
77
+ model_path="models/granite-3.1-8b-instruct-Q4_K_M.gguf",
78
+ verbose=False,
79
+ # n_gpu_layers=-1, # Uncomment to use GPU acceleration
80
+ # n_ctx=2048, # Uncomment to increase the context window
81
+ )
82
+
83
+ output = llm.create_chat_completion(
84
+ messages =[
85
+ {
86
+ "role": "system",
87
+ "content": "You are an AI Assistant"
88
+ ,
89
+ },
90
+ {"role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location."},
91
+ ]
92
+ )
93
+
94
+ print(output["choices"][0]['message']['content'])
95
+
96
+ ```
97
+
98
+
99
+
100
+ ## Intended Use
101
+
102
+ These quantized models are designed for:
103
+ - Resource-constrained environments
104
+ - Edge deployment scenarios
105
+ - Applications requiring faster inference
106
+ - Building AI assistants for multiple domains
107
+ - Business applications with limited computational resources
108
+
109
+ ## Training Information
110
+
111
+ The base model was trained on:
112
+ 1. Publicly available datasets with permissive license
113
+ 2. Internal synthetic data targeting specific capabilities
114
+ 3. Small amounts of human-curated data
115
+
116
+ Detailed attribution can be found in the upcoming Granite 3.1 Technical Report.
117
+
118
+
119
+ ## Acknowledgements
120
+
121
+ We thank Meta for developing the original IBM Granite model and the creators of the bigbio/med_qa dataset.
122
+ Special thanks to Georgi Gerganov and the entire llama.cpp development team for their outstanding contributions.
123
+ ## Contact
124
+
125
+ For any inquiries or support, please contact us at support@sandlogic.com or visit our [support page](https://www.sandlogic.com/contact-us/).
126
+
127
+ ## Explore More
128
+
129
+ For any inquiries or support, please contact us at support@sandlogic.com or visit our [support page](https://www.sandlogic.com/contact-us/).
130
+
131
+