chainyo commited on
Commit
31458d1
1 Parent(s): 05a7a4a

add adapters + instructions + tokenizer

Browse files
README.md CHANGED
@@ -9,4 +9,170 @@ tags:
9
  - peft
10
  - LoRA
11
  ---
12
- # WIP
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - peft
10
  - LoRA
11
  ---
12
+ # LoRA LLaMA Natural Instructions
13
+
14
+ ![LlaMA Natural Instructions](./llama-natural-instructions-removebg-preview.png)
15
+
16
+ This model is a fine-tuned version of [llama-13b](https://huggingface.co/decapoda-research/llama-13b-hf) from [Meta](https://huggingface.co/facebook),
17
+ on the [Natural Instructions](https://huggingface.co/datasets/Muennighoff/natural-instructions) dataset from [AllenAI](https://huggingface.co/allenai),
18
+ using the [LoRA](https://arxiv.org/pdf/2106.09685.pdf) training technique.
19
+
20
+ ⚠️ **This model is for Research purpose only (See the [license](https://huggingface.co/decapoda-research/llama-13b-hf/blob/main/LICENSE))**
21
+
22
+ ## WandB Report
23
+
24
+ Click on the badge below to see the full report on Weights & Biases.
25
+
26
+ [![WandB](https://img.shields.io/badge/Weights_&_Biases-FFCC33?style=for-the-badge&logo=WeightsAndBiases&logoColor=black)](https://api.wandb.ai/links/chainyo-mleng/91srpylj)
27
+
28
+ ## Usage
29
+
30
+ ### Installation
31
+
32
+ ```bash
33
+ pip install loralib bitsandbytes datasets git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git sentencepiece
34
+ ```
35
+
36
+ ### Format of the input
37
+
38
+ The input should be a string of text with the following format:
39
+
40
+ ```python
41
+ prompt_template = {
42
+ "prompt": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
43
+ "response": "### Response:"
44
+ }
45
+
46
+ def generate_prompt(
47
+ definition: str,
48
+ inputs: str,
49
+ targets: Union[None, str] = None,
50
+ ) -> str:
51
+ """Generate a prompt from instruction and input."""
52
+ res = prompt_template["prompt"].format(
53
+ instruction=definition, input=inputs
54
+ )
55
+
56
+ if targets:
57
+ res = f"{res}{targets}"
58
+
59
+ return res
60
+
61
+ def get_response(output: str) -> str:
62
+ """Get the response from the output."""
63
+ return output.split(prompt_template["response"])[1].strip()
64
+ ```
65
+
66
+ Feel free to use these utility functions to generate the prompt and to extract the response from the model output.
67
+
68
+ - `definition` is the instruction describing the task. It's generally a single sentence explaining the expected output and
69
+ the reasoning steps to follow.
70
+ - `inputs` is the input to the task. It can be a single sentence or a paragraph. It's the context used by the model to
71
+ generate the response to the task.
72
+ - `targets` is the expected output of the task. It's used for training the model. _It's not required for inference._
73
+
74
+ ### Inference
75
+
76
+ You can load the model using only the adapters or load the full model with the adapters and the weights.
77
+
78
+ #### The tokenizer
79
+
80
+ ```python
81
+ from transformers import LlamaTokenizer
82
+
83
+ tokenizer = LlamaTokenizer.from_pretrained("wordcab/llama-natural-instructions-13b")
84
+ tokenizer.padding_side = "left"
85
+ tokenizer.pad_token_id = (0)
86
+ ```
87
+
88
+ #### Load the model with the adapters
89
+
90
+ ```python
91
+ from peft import PeftModel
92
+ from transformers import LlamaForCausalLM
93
+
94
+ model = LlamaForCausalLM.from_pretrained(
95
+ "decapoda-research/llama-13b-hf",
96
+ load_in_8bit=True,
97
+ torch_dtype=torch.float16,
98
+ device_map="auto",
99
+ )
100
+ model = PeftModel.from_pretrained(
101
+ model,
102
+ "wordcab/llama-natural-instructions-13b",
103
+ torch_dtype=torch.float16,
104
+ device_map={"": 0},
105
+ )
106
+ ```
107
+
108
+ #### Load the full model
109
+
110
+ ⚠️ Work in progress...
111
+
112
+ ```python
113
+ model = LlamaForCausalLM.from_pretrained(
114
+ "wordcab/llama-natural-instructions-13b",
115
+ load_in_8bit=True,
116
+ torch_dtype=torch.float16,
117
+ device_map="auto",
118
+ )
119
+ ```
120
+
121
+ #### Evaluation mode
122
+
123
+ Don't forget to put the model in evaluation mode. And if you are using PyTorch v2.0 or higher don't forget to call
124
+ the compile method.
125
+
126
+ ```python
127
+ model.eval()
128
+ if torch.__version__ >= "2":
129
+ model = torch.compile(model)
130
+ ```
131
+
132
+ #### Generate the response
133
+
134
+ ```python
135
+ prompt = generate_prompt(
136
+ "In this task, you have to analyze the full sentences and do reasoning and quick maths to find the correct answer.",
137
+ f"You are now a superbowl star. You are the quarterback of the team. Your team is down by 3 points. You are in the last 2 minutes of the game. The other team has a score of 28. What is the score of your team?",
138
+ )
139
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=2048)
140
+ input_ids = inputs["input_ids"].to(model.device)
141
+
142
+ with torch.no_grad():
143
+ gen_outputs = model.generate(
144
+ input_ids=input_ids,
145
+ generation_config=generation_config,
146
+ return_dict_in_generate=True,
147
+ output_scores=True,
148
+ max_new_tokens=50,
149
+ )
150
+
151
+ s = gen_outputs.sequences[0]
152
+ output = tokenizer.decode(s, skip_special_tokens=True)
153
+ response = prompter.get_response(output)
154
+ print(response)
155
+ >>> 25
156
+ ```
157
+
158
+ You can try with other prompts that are not maths related as well! :hugs:
159
+
160
+ ## Beanchmark
161
+
162
+ We benchmarked our model on the following tasks: [BoolQ](https://huggingface.co/datasets/boolq), [PIQA](https://huggingface.co/datasets/piqa), [WinoGrande](https://huggingface.co/datasets/winogrande), [OpenBookQA](https://huggingface.co/datasets/openbookqa).
163
+
164
+ | | BoolQ | PIQA | WinoGrande | OpenBookQA | Precision | Inference time (s) |
165
+ | --- | --- | --- | --- | --- | --- | --- |
166
+ | Original LLaMA 7B | 76.5 | 79.8 | 70.1 | 57.2 | fp32 | 3 seconds |
167
+ | Original LLaMA 13B | 78.1 | 80.1 | 73 | 56.4 | fp32 | >5 seconds |
168
+ | LoRA LLaMA 7B | 63.9 | 51.3 | 48.9 | 31.4 | 8bit | 0.65 seconds |
169
+ | LoRA LLaMA 13B | 70 | 63.93 | 51.6 | 50.4 | 8bit | 1.2 seconds |
170
+
171
+ __Link to the 7B model:__ [wordcab/llama-natural-instructions-7b](https://huggingface.co/wordcab/llama-natural-instructions-7b)
172
+
173
+ Overall our LoRA model is less performant than the original model from Meta, if we compare the results from the [original paper](https://arxiv.org/pdf/2302.13971.pdf).
174
+
175
+ The performance degradation is due to the fact we load the model in 8bit and we use the adapters from the LoRA training.
176
+ Thanks to the 8bit quantization, the model is 4 times faster than the original model and the results are still decent.
177
+
178
+ Some complex tasks like WinoGrande and OpenBookQA are more difficult to solve with the adapters.
adapter_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name_or_path": "decapoda-research/llama-13b-hf",
3
+ "bias": "none",
4
+ "enable_lora": null,
5
+ "fan_in_fan_out": false,
6
+ "inference_mode": true,
7
+ "init_lora_weights": true,
8
+ "lora_alpha": 16,
9
+ "lora_dropout": 0.05,
10
+ "merge_weights": false,
11
+ "modules_to_save": null,
12
+ "peft_type": "LORA",
13
+ "r": 8,
14
+ "target_modules": [
15
+ "q_proj",
16
+ "v_proj"
17
+ ],
18
+ "task_type": "CAUSAL_LM"
19
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:830b58ecc97c15ac9768ac99aa931464814b343e9ba2309da32c942845f0caa4
3
+ size 26271757
llama-natural-instructions-removebg-preview.png ADDED
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "", "eos_token": "", "model_max_length": 2048, "tokenizer_class": "LlamaTokenizer", "unk_token": ""}