Safetensors
mjbuehler commited on
Commit
d0da105
1 Parent(s): 083993d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +167 -1
README.md CHANGED
@@ -1,3 +1,169 @@
1
  ---
2
- license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  ---
4
+ # X-LoRA
5
+ Mixture of LoRA Experts: Leverage the power of fine-tuned LoRA experts by employing a mixture of experts, or MoE technique.
6
+
7
+ X-LoRA works by learning scaling values for LoRA adapters. These learned scalings values are used to
8
+ gate the LoRA experts in a dense fashion. Additionally, all LoRA adapters and the base model are frozen, allowing efficient fine tuning due to a low parameter count.
9
+
10
+ X-LoRA is easily applied to any HuggingFace Transformers model.
11
+
12
+ ## Features
13
+ - Effective: Dense gating of experts allows effective mixing
14
+ - Efficient fine-tuning: low trainable parameter count
15
+ - Hierarchical encapsulated strategy: Re-use existing trained models or model section and re-use them to address complex tasks that cut across experts, following a bio-inspired strategy
16
+ - Easy-to-use API: `add_xlora_to_model`, broad compatibility
17
+ - Dynamically mix LoRA adapters: Deep layer-wise combinations of adapters.
18
+
19
+ ## X-LoRA source code
20
+ ```
21
+ https://github.com/EricLBuehler/xlora
22
+ ```
23
+
24
+ ## Converting and loading a model
25
+
26
+ Example for model conversation:
27
+
28
+ ```python
29
+ import torch
30
+ import xlora
31
+ from transformers import AutoConfig, AutoModelForCausalLM # type: ignore
32
+
33
+ model = AutoModelForCausalLM.from_pretrained(
34
+ "mistralai/Mistral-7B-Instruct-v0.1",
35
+ trust_remote_code=True,
36
+ use_flash_attention_2=False,
37
+ device_map="cuda:0",
38
+ torch_dtype=torch.bfloat16,
39
+ )
40
+
41
+ config = AutoConfig.from_pretrained(
42
+ "mistralai/Mistral-7B-Instruct-v0.1",
43
+ trust_remote_code=True,
44
+ use_flash_attention_2=False,
45
+ device_map="auto",
46
+ )
47
+
48
+ ### Convert the model to X-LoRA
49
+ model_created = xlora.add_xlora_to_model(
50
+ model=model,
51
+ xlora_config=xlora.xLoRAConfig(config.hidden_size, xlora_depth=8, device=torch.device("cuda")),
52
+ verbose=True,
53
+ adapters={
54
+ "adapter_1": "./path/to/the/checkpoint_adapter_1/",
55
+ "adapter_2": "./path/to/the/checkpoint_adapter_2/",
56
+ "adapter_n": "./path/to/the/checkpoint_adapter_3/",
57
+ },
58
+ )
59
+ ```
60
+
61
+ ## Loading a trained X-LoRA model from scratch
62
+ ```python
63
+ import torch
64
+ import xlora
65
+ from transformers import AutoConfig, AutoModelForCausalLM # type: ignore
66
+
67
+ model = AutoModelForCausalLM.from_pretrained(
68
+ "mistralai/Mistral-7B-Instruct-v0.1",
69
+ trust_remote_code=True,
70
+ use_flash_attention_2=False,
71
+ device_map="cuda:0",
72
+ torch_dtype=torch.bfloat16,
73
+ )
74
+
75
+ config = AutoConfig.from_pretrained(
76
+ "mistralai/Mistral-7B-Instruct-v0.1",
77
+ trust_remote_code=True,
78
+ use_flash_attention_2=False,
79
+ device_map="auto",
80
+ )
81
+
82
+ model = xlora.from_pretrained(
83
+ "./path/to/saved/model",
84
+ model,
85
+ {
86
+ "adapter_1": "./path/to/the/checkpoint/",
87
+ "adapter_2": "./path/to/the/checkpoint/",
88
+ "adapter_n": "./path/to/the/checkpoint/",
89
+ },
90
+ "cuda",
91
+ )
92
+ ```
93
+ ## Loading pre-trained X-LoRA model
94
+
95
+ ```python
96
+ import torch
97
+ from xlora.xlora_utils import load_model # type: ignore
98
+
99
+ XLoRA_model_name = "lamm-mit/x-lora/X-LoRA"
100
+
101
+ model, tokenizer = load_model(
102
+ model_name="HuggingFaceH4/zephyr-7b-beta",
103
+ device="cuda:0",
104
+ dtype=torch.bfloat16,
105
+ fine_tune_model_name=XLoRA_model_name,
106
+ adapters={
107
+ "adapter_1": "lamm-mit/x-lora/X-LoRA_adapters/1/",
108
+ "adapter_2": "lamm-mit/x-lora/X-LoRA_adapters/2/",
109
+ "adapter_3": "lamm-mit/x-lora/X-LoRA_adapters/3/",
110
+ "adapter_4": "lamm-mit/x-lora/X-LoRA_adapters/4/",
111
+ "adapter_5": "lamm-mit/x-lora/X-LoRA_adapters/5/",
112
+ "adapter_6": "lamm-mit/x-lora/X-LoRA_adapters/6/",
113
+ "adapter_7": "lamm-mit/x-lora/X-LoRA_adapters/7/",
114
+ "adapter_8": "lamm-mit/x-lora/X-LoRA_adapters/8/",
115
+ "adapter_9": "lamm-mit/x-lora/X-LoRA_adapters/9/",
116
+ },
117
+ )
118
+ ```
119
+ Inference:
120
+ ```python
121
+ def generate_response (model,tokenizer,text_input="What is the best biomaterial for superior strength?",
122
+ num_return_sequences=1,
123
+ temperature=1., #the higher the temperature, the more creative the model becomes
124
+ max_new_tokens=127,
125
+ num_beams=1,
126
+ top_k = 50,
127
+ top_p =0.9,repetition_penalty=1.,eos_token_id=2,verbatim=False,
128
+ exponential_decay_length_penalty_fac=None,add_special_tokens=True,
129
+ ):
130
+ inputs = tokenizer(text_input,
131
+ with torch.no_grad():
132
+ outputs = model.generate(input_ids = inputs["input_ids"],
133
+ attention_mask = inputs["attention_mask"] , # This is usually done automatically by the tokenizer
134
+ max_new_tokens=max_new_tokens,
135
+ temperature=temperature, #value used to modulate the next token probabilities.
136
+ num_beams=num_beams,
137
+ top_k = top_k,
138
+ top_p = top_p,
139
+ num_return_sequences = num_return_sequences,
140
+ eos_token_id=eos_token_id,
141
+ pad_token_id = eos_token_id,
142
+ do_sample =True,#skip_prompt=True,
143
+ repetition_penalty=repetition_penalty,
144
+ )
145
+ return tokenizer.batch_decode(outputs[:,inputs["input_ids"].shape[1]:].detach().cpu().numpy(), skip_special_tokens=True)
146
+
147
+ output_text=generate_response (model, tokenizer, text_input=txt,eos_token_id=eos_token,
148
+ num_return_sequences=1, repetition_penalty=1.1,
149
+ top_p=0.9, top_k=512,
150
+ temperature=0.5,max_new_tokens=256)
151
+
152
+ print (output_text[0])
153
+ ```
154
+
155
+ ## Original paper and citation
156
+
157
+ Cite this work as:
158
+ ```bibtex
159
+ @article{NiBuehler_2024,
160
+ title = {X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Mixture-of-Experts Framework for Large Language Models with Applications in Protein Mechanics and Design},
161
+ author = {E.L. Buehler, M.J. Buehler},
162
+ journal = {},
163
+ year = {2024},
164
+ volume = {},
165
+ pages = {},
166
+ url = {https://arxiv.org/abs/XXXX.YYYYY}
167
+ }
168
+ ```
169
+