File size: 13,784 Bytes
c77c863
b35b0a7
 
 
 
 
 
 
 
 
 
 
 
 
c77c863
b35b0a7
 
 
97ab16f
7c38986
97ab16f
 
 
 
b35b0a7
 
 
97ab16f
 
 
 
b35b0a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97ab16f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b35b0a7
97ab16f
 
 
 
 
 
 
 
b35b0a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
---
language:
- en
- no
pipeline_tag: text-generation
inference: false
tags:
- pytorch
- llama
- llama-2
- norwegian
- norsk
datasets:
- NbAiLab/norwegian-alpaca
---
# Llama 2 13b Chat Norwegian GPTQ
**This is a GPTQ version of Llama 2 13b Norwegian**

Read more about [GPTQ here](https://towardsdatascience.com/4-bit-quantization-with-gptq-36b0f4f02c34). 
GPTQ models are less computationally intensive to run, but their performance accuracy is often significantly lower than full models. In testing, this model has demonstrated its ability to reply in Norwegian; however, its knowledge and quality of responses are limited.

For a demo script, see [here](#demo-script).

Llama-2-13b-chat-norwegian-GPTQ is a variant of [Meta](https://huggingface.co/meta-llama)´s [Llama 2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) model, finetuned on a mix of norwegian datasets created in [Ruter AI Lab](https://ruter.no) the summer of 2023.

The model is tuned to understand and generate text in Norwegian. It's trained for one epoch on norwegian-alpaca + 15000 samples of machine-translated data from OpenOrca (the dataset to be released). A small subset of custom-made instructional data is also included.

For other versions of this model see:
* [Llama-2-13b-chat-norwegian](https://huggingface.co/RuterNorway/Llama-2-13b-chat-norwegian)
* [Llama-2-13b-chat-norwegian-LoRa](https://huggingface.co/RuterNorway/Llama-2-13b-chat-norwegian-LoRa)
* [Llama-2-13b-chat-norwegian-GPTQ](https://huggingface.co/RuterNorway/Llama-2-13b-chat-norwegian-GPTQ)


## Data
* Norwegian alpaca
* 15k Norwegian OpenOrcra (to be released)
* Small subset of custom made instructional data

## Intended Use
This model is intended for commercial and research use in Norwegian and can be used as an assistant-like chat.

## Prompt Template
Llama2 Chat uses a new prompt format:

```
<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. Please answer in the same language as the user.
<</SYS>>
This is a test question[/INST] This is a answer </s><s>
```
See also the original implementation [here](https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L213).

We also implemented the alpaca prompt format, which the model supports.:
```
### Instruction:
Summarize following text.
### Input:
Text to be summarized
### Response:
```

## Why this model?
As a Norwegian company, we understand firsthand the pressing need for powerful language models tailored to specific languages. Our primary focus is on the Norwegian linguistic landscape. In the age of digitization, languages that lack robust, open-source models can risk becoming marginalized. This is why we're introducing this open-source Norwegian model. We believe that by making such resources freely accessible, we can democratize information, foster innovation, and create a more inclusive digital ecosystem. Our aspiration is for this model to serve as a foundational resource for future specialized Norwegian models. Ultimately, our goal is to bolster the Norwegian NLP community and facilitate the smoother integration of Norwegian models into diverse projects.

## Limitations
*   This is an LLM, not a knowledge model. It can not be expected to have more information about Norway than the basemodel.
*   It will generally preform better on tasks that involves summarization, question answering and chat, than on tasks that requires more knowledge about Norway, specific domains, or tasks where the model can answer freely.
*   The data used for training is machine translated, and may contain grammatical errors and other errors.
*   The model is released as is, and would in most cases need prompt tuning to achieve optimal results.


## License
Llama 2 is licensed under the LLAMA 2 [Community License](https://ai.meta.com/resources/models-and-libraries/llama-downloads/), Copyright © Meta Platforms, Inc. All Rights Reserved.
See the original [model card](https://huggingface.co/meta-llama/Llama-2-13b) for more information.

From [norwegian-alpaca](https://huggingface.co/NbAiLab/norwegian-alpaca) we also note that "the current version uses OpenAI's gpt-3.5-turbo; hence, this dataset cannot be used to create models that compete in any way against OpenAI."


## Disclaimer
*   The model is available "as is". Ruter As takes no responsibility for further use.
*   During testing, it seems that the safeguards implemented by Meta, still work as expected in this model. However, we want to point to the Ethical Considerations and Limitations from the origenal model card:
```
Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios.
For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts.
Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model.
Please see the Responsible Use Guide available at https://ai.meta.com/llama/responsible-use-guide/
```
## Credits
This model was made at Ruters AI Lab - summer of 2023 as part of their AI initiative.

The team wants to thank the support we got from the entire Ruter organization, and especially the Data Science team.

___
# Llama 2 13b Chat Norwegian GPTQ (Norsk)
**Dette er en GPTQ versjon av Llama 2 13b Chat Norwegian modellen**
Llama-2-13b-chat-norwegian-GPQT er en versjon av [Meta](https://huggingface.co/meta-llama) sin [Llama 2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) model, finetuned på en kombinasjon av diverse norske datasett. Modellen ble laget i [Ruter AI Lab](https://ruter.no) 2023.

Les mer om [GPTQ her](https://towardsdatascience.com/4-bit-quantization-with-gptq-36b0f4f02c34). 

For demo script, se [her](#demo-script).

Andre versjoner av modellen:

* [Llama-2-13b-chat-norwegian](https://huggingface.co/RuterNorway/Llama-2-13b-chat-norwegian)
* [Llama-2-13b-chat-norwegian-LoRa](https://huggingface.co/RuterNorway/Llama-2-13b-chat-norwegian-LoRa)
* [Llama-2-13b-chat-norwegian-GPTQ](https://huggingface.co/RuterNorway/Llama-2-13b-chat-norwegian-GPTQ)


Modellen er finetuned til å forstå og generere tekst på Norsk. Den er trent i én epoch med norwegian-alpaca + et utvalg av 15000 maskinoversatt data fra OpenOrca (datasett venter på utgivelse). Det består og av et lite sett med selvlagde instruksjonsdata.

**This is a GPTQ version of Llama 2 13b Norwegian**




Llama-2-13b-chat-norwegian is a variant of [Meta](https://huggingface.co/meta-llama)´s [Llama 2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) model, finetuned on a mix of norwegian datasets created in [Ruter AI Lab](https://ruter.no) the summer of 2023.

The model is tuned to understand and generate text in Norwegian. It's trained for one epoch on norwegian-alpaca + 15000 samples of machine-translated data from OpenOrca (the dataset to be released). A small subset of custom-made instructional data is also included.



## Data
* Norwegian alpaca
* 15k Norwegian OpenOrcra (venter på utgivelse)
* Lite sett med selvlagde instruksjonsdata


## Prompt Mal
Llama2 Chat bruker et nytt prompt format:
```
<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. Please answer in the same language as the user.
<</SYS>>
This is a test question[/INST] This is a answer </s><s>
```
Se orgianl implementasjon [her](https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L213).

Vi har også implementert alpaca prompt formatet, som også er støttet av modellen.
```
### Instruction:
Summarize following text.
### Input:
Text to be summarized
### Response:
```

## Hvorfor denne modellen?
Som et norsk selskap forstår vi selv det presserende behovet for kraftige språkmodeller tilpasset spesifikke språk. Vårt primære fokus er på det norske språkområdet. I den digitale alderen risikerer språk som mangler robuste, åpne kildekodemodeller å bli marginalisert. Dette er grunnen til at vi nå introduserer denne åpne kildekodemodellen for norsk. Vi tror at ved å gjøre disse ressursene tilgjengelige gratis, kan vi demokratisere informasjonen, fremme innovasjon og skape et mer inkluderende digitalt økosystem. Vår ambisjon er at denne modellen skal tjene som en grunnleggende ressurs for fremtidige spesialiserte norske modeller. Vårt mål er å styrke det norske NLP-miljøet og gjøre det enklere å innlemme norske modeller i ulike prosjekter.

## Begrensninger
*   Dette er en LLM, ikke en kunnskapsmodell. Den kan ikke forventes å ha mer informasjon om Norge enn basismodellen.
*   Den vil generelt prestere bedre på oppgaver som innebærer oppsummering, spørsmålsbesvarelse og chat, enn på oppgaver som krever mer kunnskap om Norge, spesifikke domener, eller oppgaver hvor modellen kan svare fritt.
*   Dataene som brukes til trening er maskinoversatt, og kan inneholde grammatiske feil. Vi har kun gjort en rask manuell sjekk av dataene.
*   Modellen er utgitt som den er, og vil i de fleste tilfeller trenge "prompt tuning" for å oppnå ønskede resultater.

## Lisens
Llama 2 er lisensiert under LLAMA 2 [Community License](https://ai.meta.com/resources/models-and-libraries/llama-downloads/), Copyright © Meta Platforms, Inc. All Rights Reserved.
Se det orginale [modell kortet](https://huggingface.co/meta-llama/Llama-2-13b) for mer informasjon.


Fra [norwegian-alpaca](https://huggingface.co/NbAiLab/norwegian-alpaca) vil vi gjøre oppmerksomme på at "the current version uses OpenAI's gpt-3.5-turbo; hence, this dataset cannot be used to create models that compete in any way against OpenAI."


## Ansvarsfraskrivelse
*   Modellen tilgjengeliggjøres «som den er». Ruter As tar ikke noe ansvar for videre bruk.
*   Under testingen virket det som sikkerhetstiltakene implementert av Meta fortsatt fungerer som forventet for denne modellen. Vi gjør derimot oppmerksom på de etiske betraktiningene og begrensningene fra det orignale modellkortet:
```
Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios.
For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts.
Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model.
Please see the Responsible Use Guide available at https://ai.meta.com/llama/responsible-use-guide/
```

# Demo script
## How to use this GPTQ model from Python code

First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:

`GITHUB_ACTIONS=true pip install auto-gptq`

Then try the following example code:

```python
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "RuterNorway/Llama-2-13b-chat-norwegian-GPTQ"
model_basename = "gptq_model-4bit-128g"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)
"""
To download from a specific branch, use the revision parameter, as in this example:
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        revision="gptq-4bit-32g-actorder_True",
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        quantize_config=None)
"""
prompt = "Fortell meg om AI"
prompt_template=f'''### Human: {prompt}
### Assistant:
'''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))
# Inference can also be done using transformers' pipeline
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)
print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)
print(pipe(prompt_template)[0]['generated_text'])
```

## Compatibility

The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLaMa (only CUDA has been tested), ExLlama, and Occ4m's GPTQ-for-LLaMa fork.