Text Generation
English
sft
jordiclive commited on
Commit
6c62f3c
1 Parent(s): 4852723

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +144 -0
README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - Nebulous/gpt4all_pruned
5
+ - sahil2801/CodeAlpaca-20k
6
+ - yahma/alpaca-cleaned
7
+ language:
8
+ - en
9
+ tags:
10
+ - sft
11
+ pipeline_tag: text-generation
12
+ widget:
13
+ - text: <|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
14
+ - text: <|prompter|>What's the Earth total population</s><|assistant|>
15
+ - text: <|prompter|>Write a story about future of AI development</s><|assistant|>
16
+ ---
17
+
18
+ # LoRA Adapter for LLaMA 13B trained on more datasets than tloen/alpaca-lora-7b
19
+
20
+ This repo contains a low-rank adapter for **LLaMA-13b** fit on
21
+ - `Nebulous/gpt4all_pruned`
22
+ - `sahil2801/CodeAlpaca-20k`
23
+ - `yahma/alpaca-cleaned`
24
+ - datasets part of the OpenAssistant project.
25
+
26
+
27
+ This version of the weights was trained with the following hyperparameters:
28
+
29
+ - Epochs: 2
30
+ - Batch size: 128
31
+ - Max Length: 2048
32
+ - Learning rate: 4e-6
33
+ - Lora _r_: 16
34
+ - Lora Alpha: 32
35
+ - Lora target modules: q_proj, k_proj, v_proj, o_proj
36
+
37
+ The model was trained with flash attention and gradient checkpointing.
38
+
39
+
40
+ ## Model Details
41
+
42
+ - **Developed** as part of the OpenAssistant Project
43
+ - **Model type:** PEFT Adapter for frozen LLaMA
44
+ - **Language:** English
45
+
46
+ ## Prompting
47
+
48
+ Two special tokens are used to mark the beginning of user and assistant turns:
49
+ `<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
50
+
51
+ Input prompt example:
52
+ ```
53
+ <|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
54
+ ```
55
+ The input ends with the `<|assistant|>` token to signal that the model should
56
+ start generating the assistant reply.
57
+
58
+
59
+ # Example Inference Code (Note several embeddings need to be loaded along with the LoRA weights), assumes on GPU and torch.float16:
60
+
61
+ ```
62
+ from typing import List, NamedTuple
63
+
64
+ import torch
65
+ import transformers
66
+ from huggingface_hub import hf_hub_download
67
+ from peft import PeftModel
68
+ from transformers import GenerationConfig
69
+
70
+ device = "cuda" if torch.cuda.is_available() else "cpu"
71
+ tokenizer = transformers.AutoTokenizer.from_pretrained("jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b")
72
+
73
+
74
+ model = transformers.AutoModelForCausalLM.from_pretrained(
75
+ "decapoda-research/llama-13b-hf", torch_dtype=torch.float16
76
+ ) # Load Base Model
77
+ model.resize_token_embeddings(
78
+ 32016
79
+ ) # This model repo also contains several embeddings for special tokens that need to be loaded.
80
+
81
+ model.config.eos_token_id = tokenizer.eos_token_id
82
+ model.config.bos_token_id = tokenizer.bos_token_id
83
+ model.config.pad_token_id = tokenizer.pad_token_id
84
+
85
+ lora_weights = "jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b"
86
+ model = PeftModel.from_pretrained(
87
+ model,
88
+ lora_weights,
89
+ torch_dtype=torch.float16,
90
+ ) # Load Lora model
91
+
92
+ model.eos_token_id = tokenizer.eos_token_id
93
+ filename = hf_hub_download("jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b", "extra_embeddings.pt")
94
+ embed_weights = torch.load(
95
+ filename, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu")
96
+ ) # Load embeddings for special tokens
97
+ model.base_model.model.model.embed_tokens.weight[32000:, :] = embed_weights.to(
98
+ model.base_model.model.model.embed_tokens.weight.dtype
99
+ ).to(
100
+ device
101
+ ) # Add special token embeddings
102
+
103
+
104
+ model = model.half().to(device)
105
+ generation_config = GenerationConfig(
106
+ temperature=0.1,
107
+ top_p=0.75,
108
+ top_k=40,
109
+ num_beams=4,
110
+ )
111
+
112
+
113
+ def format_system_prompt(prompt, eos_token="</s>"):
114
+ return "{}{}{}".format(
115
+ "<|prompter|>",
116
+ prompt,
117
+ eos_token,
118
+ )
119
+
120
+
121
+ def generate(prompt, generation_config=generation_config, max_new_tokens=2048, device=device):
122
+ prompt = format_system_prompt(prompt) # OpenAssistant Prompt Format expected
123
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
124
+ with torch.no_grad():
125
+ generation_output = model.generate(
126
+ input_ids=input_ids,
127
+ generation_config=generation_config,
128
+ return_dict_in_generate=True,
129
+ output_scores=True,
130
+ max_new_tokens=max_new_tokens,
131
+ eos_token_id=2,
132
+ )
133
+ s = generation_output.sequences[0]
134
+ output = tokenizer.decode(s)
135
+ print("Text generated:")
136
+ print(output)
137
+ return output
138
+
139
+
140
+ generate("What is a meme, and what's the history behind this word?")
141
+ generate("What's the Earth total population")
142
+ generate("Write a story about future of AI development")
143
+ ```
144
+