x-lora / README.md

mjbuehler

Update README.md

6bd5dfc verified 6 months ago

preview code

raw

history blame

No virus

6.56 kB

	---
	license: apache-2.0
	---
	# X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models

	X-LoRA works by learning scaling values for LoRA adapters. These learned scalings values are used to
	gate the LoRA experts in a dense fashion. Additionally, all LoRA adapters and the base model are frozen, allowing efficient fine tuning due to a low parameter count.

	X-LoRA is easily applied to any HuggingFace Transformers model.

	## Features
	- Effective: Dense gating of experts allows effective mixing
	- Efficient fine-tuning: low trainable parameter count
	- Hierarchical encapsulated strategy: Re-use existing trained models or model section and re-use them to address complex tasks that cut across experts, following a bio-inspired strategy
	- Easy-to-use API: `add_xlora_to_model`, broad compatibility
	- Dynamically mix LoRA adapters: Deep layer-wise combinations of adapters.

	## X-LoRA source code

	Install directly from source

	```
	pip install git+https://github.com/EricLBuehler/xlora.git -U
	```

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/JVzaFIISQ780X92VqaHKD.png)

	Further details on installation, packages with source code, API details and more examples:

	[https://github.com/EricLBuehler/xlora](https://github.com/EricLBuehler/xlora)

	## Converting and loading a model

	Example for model conversation:

	```python
	import torch
	import xlora
	from transformers import AutoConfig, AutoModelForCausalLM # type: ignore

	model = AutoModelForCausalLM.from_pretrained(
	"mistralai/Mistral-7B-Instruct-v0.1",
	trust_remote_code=True,
	use_flash_attention_2=False,
	device_map="cuda:0",
	torch_dtype=torch.bfloat16,
	)

	config = AutoConfig.from_pretrained(
	"mistralai/Mistral-7B-Instruct-v0.1",
	trust_remote_code=True,
	use_flash_attention_2=False,
	device_map="auto",
	)

	### Convert the model to X-LoRA
	model_created = xlora.add_xlora_to_model(
	model=model,
	xlora_config=xlora.xLoRAConfig(config.hidden_size, xlora_depth=8, device=torch.device("cuda")),
	verbose=True,
	adapters={
	"adapter_1": "./path/to/the/checkpoint_adapter_1/",
	"adapter_2": "./path/to/the/checkpoint_adapter_2/",
	"adapter_n": "./path/to/the/checkpoint_adapter_3/",
	},
	)
	```

	## Loading a trained X-LoRA model from scratch
	```python
	import torch
	import xlora
	from transformers import AutoConfig, AutoModelForCausalLM # type: ignore

	model = AutoModelForCausalLM.from_pretrained(
	"mistralai/Mistral-7B-Instruct-v0.1",
	trust_remote_code=True,
	use_flash_attention_2=False,
	device_map="cuda:0",
	torch_dtype=torch.bfloat16,
	)

	config = AutoConfig.from_pretrained(
	"mistralai/Mistral-7B-Instruct-v0.1",
	trust_remote_code=True,
	use_flash_attention_2=False,
	device_map="auto",
	)

	model = xlora.from_pretrained(
	"./path/to/saved/model",
	model,
	{
	"adapter_1": "./path/to/the/checkpoint/",
	"adapter_2": "./path/to/the/checkpoint/",
	"adapter_n": "./path/to/the/checkpoint/",
	},
	"cuda",
	)
	```
	## Loading pre-trained X-LoRA model directly from Hugging Face Hub

	```python
	import torch
	from xlora.xlora_utils import load_model

	XLoRa_model_name = 'lamm-mit/x-lora'

	model,tokenizer=load_model(model_name = XLoRa_model_name,
	device='cuda:0',
	use_flash_attention_2=True,
	dtype=torch.bfloat16,
	)
	)
	```
	Inference:
	```python
	def generate_response (model, tokenizer,
	text_input="What is the best biomaterial for superior strength?",
	num_return_sequences = 1,
	temperature = 0.75,
	max_new_tokens = 127,
	num_beams = 1,
	top_k = 50,
	top_p = 0.9,
	repetition_penalty=1.,
	eos_token_id=2,
	add_special_tokens=True,
	):
	inputs = tokenizer(text_input, add_special_tokens=add_special_tokens)
	with torch.no_grad():
	outputs = model.generate(input_ids = inputs["input_ids"],
	attention_mask = inputs["attention_mask"] ,
	max_new_tokens=max_new_tokens,
	temperature=temperature,
	num_beams=num_beams,
	top_k = top_k,
	top_p = top_p,
	num_return_sequences = num_return_sequences,
	eos_token_id=eos_token_id,
	pad_token_id = eos_token_id,
	do_sample =True,
	repetition_penalty=repetition_penalty,
	)
	return tokenizer.batch_decode(outputs[:,inputs["input_ids"].shape[1]:].detach().cpu().numpy(), skip_special_tokens=True)

	output_text=generate_response (model, tokenizer, text_input=txt,eos_token_id=eos_token,
	num_return_sequences=1, repetition_penalty=1.1,
	top_p=0.9, top_k=512,
	temperature=0.5,
	max_new_tokens=256)

	print (output_text[0])
	```

	## Dataset

	See [lamm-mit/x-lora-dataset](https://huggingface.co/datasets/lamm-mit/x-lora-dataset) for the dataset used to train the X-LoRA model. Details on the datasets used to train the original adapters are included in the paper (see reference below).

	## Sample results

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/GRbDJcIqkZZrQAVXyKB2H.png)

	## Acknowledgements

	This work is built on the Hugging Face [PEFT library](https://github.com/huggingface/peft/tree/main/) and other components in the Hugging Face ecosystem. We acknowledge the authors of this excellent library and related methods.

	## Original paper and citation

	Cite this work as:
	```bibtex
	@article{Buehler_XLoRA_2024,
	title = {X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Design},
	author = {E.L. Buehler, M.J. Buehler},
	journal = {},
	year = {2024},
	volume = {},
	pages = {},
	url = {https://arxiv.org/abs/2402.07148}
	}
	```