File size: 6,250 Bytes
28c51d1 d0da105 28c51d1 400e922 d0da105 86236c3 d0da105 ded826f d0da105 569d3a4 ded826f d0da105 ded826f d0da105 ded826f d0da105 569d3a4 d0da105 569d3a4 d0da105 400e922 d0da105 400e922 d0da105 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
---
license: apache-2.0
---
# X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models
X-LoRA works by learning scaling values for LoRA adapters. These learned scalings values are used to
gate the LoRA experts in a dense fashion. Additionally, all LoRA adapters and the base model are frozen, allowing efficient fine tuning due to a low parameter count.
X-LoRA is easily applied to any HuggingFace Transformers model.
## Features
- Effective: Dense gating of experts allows effective mixing
- Efficient fine-tuning: low trainable parameter count
- Hierarchical encapsulated strategy: Re-use existing trained models or model section and re-use them to address complex tasks that cut across experts, following a bio-inspired strategy
- Easy-to-use API: `add_xlora_to_model`, broad compatibility
- Dynamically mix LoRA adapters: Deep layer-wise combinations of adapters.
## X-LoRA source code
Installation, source code, API details and more examples:
[https://github.com/EricLBuehler/xlora](https://github.com/EricLBuehler/xlora)
## Converting and loading a model
Example for model conversation:
```python
import torch
import xlora
from transformers import AutoConfig, AutoModelForCausalLM # type: ignore
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
trust_remote_code=True,
use_flash_attention_2=False,
device_map="cuda:0",
torch_dtype=torch.bfloat16,
)
config = AutoConfig.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
trust_remote_code=True,
use_flash_attention_2=False,
device_map="auto",
)
### Convert the model to X-LoRA
model_created = xlora.add_xlora_to_model(
model=model,
xlora_config=xlora.xLoRAConfig(config.hidden_size, xlora_depth=8, device=torch.device("cuda")),
verbose=True,
adapters={
"adapter_1": "./path/to/the/checkpoint_adapter_1/",
"adapter_2": "./path/to/the/checkpoint_adapter_2/",
"adapter_n": "./path/to/the/checkpoint_adapter_3/",
},
)
```
## Loading a trained X-LoRA model from scratch
```python
import torch
import xlora
from transformers import AutoConfig, AutoModelForCausalLM # type: ignore
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
trust_remote_code=True,
use_flash_attention_2=False,
device_map="cuda:0",
torch_dtype=torch.bfloat16,
)
config = AutoConfig.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
trust_remote_code=True,
use_flash_attention_2=False,
device_map="auto",
)
model = xlora.from_pretrained(
"./path/to/saved/model",
model,
{
"adapter_1": "./path/to/the/checkpoint/",
"adapter_2": "./path/to/the/checkpoint/",
"adapter_n": "./path/to/the/checkpoint/",
},
"cuda",
)
```
## Loading pre-trained X-LoRA model
```python
import torch
from xlora.xlora_utils import load_model # type: ignore
XLoRA_model_name = "lamm-mit/x-lora/X-LoRA"
model, tokenizer = load_model(
model_name="HuggingFaceH4/zephyr-7b-beta",
device="cuda:0",
dtype=torch.bfloat16,
fine_tune_model_name=XLoRA_model_name,
adapters={
"adapter_1": "lamm-mit/x-lora/X-LoRA_adapters/1/",
"adapter_2": "lamm-mit/x-lora/X-LoRA_adapters/2/",
"adapter_3": "lamm-mit/x-lora/X-LoRA_adapters/3/",
"adapter_4": "lamm-mit/x-lora/X-LoRA_adapters/4/",
"adapter_5": "lamm-mit/x-lora/X-LoRA_adapters/5/",
"adapter_6": "lamm-mit/x-lora/X-LoRA_adapters/6/",
"adapter_7": "lamm-mit/x-lora/X-LoRA_adapters/7/",
"adapter_8": "lamm-mit/x-lora/X-LoRA_adapters/8/",
"adapter_9": "lamm-mit/x-lora/X-LoRA_adapters/9/",
},
)
```
Inference:
```python
def generate_response (model, tokenizer,
text_input="What is the best biomaterial for superior strength?",
num_return_sequences = 1,
temperature = 0.75,
max_new_tokens = 127,
num_beams = 1,
top_k = 50,
top_p = 0.9, repetition_penalty=1.,
eos_token_id=2,
add_special_tokens=True,
):
inputs = tokenizer(text_input,
with torch.no_grad():
outputs = model.generate(input_ids = inputs["input_ids"],
attention_mask = inputs["attention_mask"] ,
max_new_tokens=max_new_tokens,
temperature=temperature,
num_beams=num_beams,
top_k = top_k,
top_p = top_p,
num_return_sequences = num_return_sequences,
eos_token_id=eos_token_id,
pad_token_id = eos_token_id,
do_sample =True,
repetition_penalty=repetition_penalty,
)
return tokenizer.batch_decode(outputs[:,inputs["input_ids"].shape[1]:].detach().cpu().numpy(), skip_special_tokens=True)
output_text=generate_response (model, tokenizer, text_input=txt,eos_token_id=eos_token,
num_return_sequences=1, repetition_penalty=1.1,
top_p=0.9, top_k=512,
temperature=0.5,
max_new_tokens=256)
print (output_text[0])
```
## Acknowledgements
This work is built on the Hugging Face [PEFT library](https://github.com/huggingface/peft/tree/main/src/peft) and other components in the Hugging Face ecosystem.
## Original paper and citation
Cite this work as:
```bibtex
@article{NiBuehler_2024,
title = {X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Design},
author = {E.L. Buehler, M.J. Buehler},
journal = {},
year = {2024},
volume = {},
pages = {},
url = {https://arxiv.org/abs/XXXX.YYYYY}
}
```
|