Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,169 @@
|
|
1 |
---
|
2 |
-
license:
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
---
|
4 |
+
# X-LoRA
|
5 |
+
Mixture of LoRA Experts: Leverage the power of fine-tuned LoRA experts by employing a mixture of experts, or MoE technique.
|
6 |
+
|
7 |
+
X-LoRA works by learning scaling values for LoRA adapters. These learned scalings values are used to
|
8 |
+
gate the LoRA experts in a dense fashion. Additionally, all LoRA adapters and the base model are frozen, allowing efficient fine tuning due to a low parameter count.
|
9 |
+
|
10 |
+
X-LoRA is easily applied to any HuggingFace Transformers model.
|
11 |
+
|
12 |
+
## Features
|
13 |
+
- Effective: Dense gating of experts allows effective mixing
|
14 |
+
- Efficient fine-tuning: low trainable parameter count
|
15 |
+
- Hierarchical encapsulated strategy: Re-use existing trained models or model section and re-use them to address complex tasks that cut across experts, following a bio-inspired strategy
|
16 |
+
- Easy-to-use API: `add_xlora_to_model`, broad compatibility
|
17 |
+
- Dynamically mix LoRA adapters: Deep layer-wise combinations of adapters.
|
18 |
+
|
19 |
+
## X-LoRA source code
|
20 |
+
```
|
21 |
+
https://github.com/EricLBuehler/xlora
|
22 |
+
```
|
23 |
+
|
24 |
+
## Converting and loading a model
|
25 |
+
|
26 |
+
Example for model conversation:
|
27 |
+
|
28 |
+
```python
|
29 |
+
import torch
|
30 |
+
import xlora
|
31 |
+
from transformers import AutoConfig, AutoModelForCausalLM # type: ignore
|
32 |
+
|
33 |
+
model = AutoModelForCausalLM.from_pretrained(
|
34 |
+
"mistralai/Mistral-7B-Instruct-v0.1",
|
35 |
+
trust_remote_code=True,
|
36 |
+
use_flash_attention_2=False,
|
37 |
+
device_map="cuda:0",
|
38 |
+
torch_dtype=torch.bfloat16,
|
39 |
+
)
|
40 |
+
|
41 |
+
config = AutoConfig.from_pretrained(
|
42 |
+
"mistralai/Mistral-7B-Instruct-v0.1",
|
43 |
+
trust_remote_code=True,
|
44 |
+
use_flash_attention_2=False,
|
45 |
+
device_map="auto",
|
46 |
+
)
|
47 |
+
|
48 |
+
### Convert the model to X-LoRA
|
49 |
+
model_created = xlora.add_xlora_to_model(
|
50 |
+
model=model,
|
51 |
+
xlora_config=xlora.xLoRAConfig(config.hidden_size, xlora_depth=8, device=torch.device("cuda")),
|
52 |
+
verbose=True,
|
53 |
+
adapters={
|
54 |
+
"adapter_1": "./path/to/the/checkpoint_adapter_1/",
|
55 |
+
"adapter_2": "./path/to/the/checkpoint_adapter_2/",
|
56 |
+
"adapter_n": "./path/to/the/checkpoint_adapter_3/",
|
57 |
+
},
|
58 |
+
)
|
59 |
+
```
|
60 |
+
|
61 |
+
## Loading a trained X-LoRA model from scratch
|
62 |
+
```python
|
63 |
+
import torch
|
64 |
+
import xlora
|
65 |
+
from transformers import AutoConfig, AutoModelForCausalLM # type: ignore
|
66 |
+
|
67 |
+
model = AutoModelForCausalLM.from_pretrained(
|
68 |
+
"mistralai/Mistral-7B-Instruct-v0.1",
|
69 |
+
trust_remote_code=True,
|
70 |
+
use_flash_attention_2=False,
|
71 |
+
device_map="cuda:0",
|
72 |
+
torch_dtype=torch.bfloat16,
|
73 |
+
)
|
74 |
+
|
75 |
+
config = AutoConfig.from_pretrained(
|
76 |
+
"mistralai/Mistral-7B-Instruct-v0.1",
|
77 |
+
trust_remote_code=True,
|
78 |
+
use_flash_attention_2=False,
|
79 |
+
device_map="auto",
|
80 |
+
)
|
81 |
+
|
82 |
+
model = xlora.from_pretrained(
|
83 |
+
"./path/to/saved/model",
|
84 |
+
model,
|
85 |
+
{
|
86 |
+
"adapter_1": "./path/to/the/checkpoint/",
|
87 |
+
"adapter_2": "./path/to/the/checkpoint/",
|
88 |
+
"adapter_n": "./path/to/the/checkpoint/",
|
89 |
+
},
|
90 |
+
"cuda",
|
91 |
+
)
|
92 |
+
```
|
93 |
+
## Loading pre-trained X-LoRA model
|
94 |
+
|
95 |
+
```python
|
96 |
+
import torch
|
97 |
+
from xlora.xlora_utils import load_model # type: ignore
|
98 |
+
|
99 |
+
XLoRA_model_name = "lamm-mit/x-lora/X-LoRA"
|
100 |
+
|
101 |
+
model, tokenizer = load_model(
|
102 |
+
model_name="HuggingFaceH4/zephyr-7b-beta",
|
103 |
+
device="cuda:0",
|
104 |
+
dtype=torch.bfloat16,
|
105 |
+
fine_tune_model_name=XLoRA_model_name,
|
106 |
+
adapters={
|
107 |
+
"adapter_1": "lamm-mit/x-lora/X-LoRA_adapters/1/",
|
108 |
+
"adapter_2": "lamm-mit/x-lora/X-LoRA_adapters/2/",
|
109 |
+
"adapter_3": "lamm-mit/x-lora/X-LoRA_adapters/3/",
|
110 |
+
"adapter_4": "lamm-mit/x-lora/X-LoRA_adapters/4/",
|
111 |
+
"adapter_5": "lamm-mit/x-lora/X-LoRA_adapters/5/",
|
112 |
+
"adapter_6": "lamm-mit/x-lora/X-LoRA_adapters/6/",
|
113 |
+
"adapter_7": "lamm-mit/x-lora/X-LoRA_adapters/7/",
|
114 |
+
"adapter_8": "lamm-mit/x-lora/X-LoRA_adapters/8/",
|
115 |
+
"adapter_9": "lamm-mit/x-lora/X-LoRA_adapters/9/",
|
116 |
+
},
|
117 |
+
)
|
118 |
+
```
|
119 |
+
Inference:
|
120 |
+
```python
|
121 |
+
def generate_response (model,tokenizer,text_input="What is the best biomaterial for superior strength?",
|
122 |
+
num_return_sequences=1,
|
123 |
+
temperature=1., #the higher the temperature, the more creative the model becomes
|
124 |
+
max_new_tokens=127,
|
125 |
+
num_beams=1,
|
126 |
+
top_k = 50,
|
127 |
+
top_p =0.9,repetition_penalty=1.,eos_token_id=2,verbatim=False,
|
128 |
+
exponential_decay_length_penalty_fac=None,add_special_tokens=True,
|
129 |
+
):
|
130 |
+
inputs = tokenizer(text_input,
|
131 |
+
with torch.no_grad():
|
132 |
+
outputs = model.generate(input_ids = inputs["input_ids"],
|
133 |
+
attention_mask = inputs["attention_mask"] , # This is usually done automatically by the tokenizer
|
134 |
+
max_new_tokens=max_new_tokens,
|
135 |
+
temperature=temperature, #value used to modulate the next token probabilities.
|
136 |
+
num_beams=num_beams,
|
137 |
+
top_k = top_k,
|
138 |
+
top_p = top_p,
|
139 |
+
num_return_sequences = num_return_sequences,
|
140 |
+
eos_token_id=eos_token_id,
|
141 |
+
pad_token_id = eos_token_id,
|
142 |
+
do_sample =True,#skip_prompt=True,
|
143 |
+
repetition_penalty=repetition_penalty,
|
144 |
+
)
|
145 |
+
return tokenizer.batch_decode(outputs[:,inputs["input_ids"].shape[1]:].detach().cpu().numpy(), skip_special_tokens=True)
|
146 |
+
|
147 |
+
output_text=generate_response (model, tokenizer, text_input=txt,eos_token_id=eos_token,
|
148 |
+
num_return_sequences=1, repetition_penalty=1.1,
|
149 |
+
top_p=0.9, top_k=512,
|
150 |
+
temperature=0.5,max_new_tokens=256)
|
151 |
+
|
152 |
+
print (output_text[0])
|
153 |
+
```
|
154 |
+
|
155 |
+
## Original paper and citation
|
156 |
+
|
157 |
+
Cite this work as:
|
158 |
+
```bibtex
|
159 |
+
@article{NiBuehler_2024,
|
160 |
+
title = {X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Mixture-of-Experts Framework for Large Language Models with Applications in Protein Mechanics and Design},
|
161 |
+
author = {E.L. Buehler, M.J. Buehler},
|
162 |
+
journal = {},
|
163 |
+
year = {2024},
|
164 |
+
volume = {},
|
165 |
+
pages = {},
|
166 |
+
url = {https://arxiv.org/abs/XXXX.YYYYY}
|
167 |
+
}
|
168 |
+
```
|
169 |
+
|