uwnlp
/

llama-2-70b-qlora-openorca

Model card Files Files and versions Community

artidoro commited on Jul 26, 2023

Commit

24a4437

•

1 Parent(s): a35f3d0

qlora llama 70b openorca

Browse files

Files changed (3) hide show

README.md +118 -0
adapter_config.json +22 -0
adapter_model.bin +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,121 @@
 ---
 license: cc-by-nc-4.0
 ---

 ---
 license: cc-by-nc-4.0
 ---
+# QLoRA Instruction Tuned Models
+| [Paper](https://arxiv.org/abs/2305.14314) | [Code](https://github.com/artidoro/qlora) |
+**The `LLaMA-2 QLoRA OpenOrca` are open-source models obtained through 4-bit QLoRA tuning of LLaMA-2 base models 240k exmaples of OpenOrca.**
+⚠️ These models are purely intended for research purposes and could produce problematic outputs.
+## What are QLoRA Instruction Tuned Models and why use them?
+- **Strong performance on MMLU** following the QLoRA instruction tuning.
+- **Replicable and efficient instruction tuning procedure** that can be extended to new use cases. QLoRA training scripts are available in the [QLoRA repo](https://github.com/artidoro/qlora).
+- **Rigorous comparison to 16-bit methods** (both 16-bit full-finetuning and LoRA) in [our paper](https://arxiv.org/abs/2305.14314) demonstrates the effectiveness of 4-bit QLoRA finetuning.
+- **Lightweight** checkpoints which only contain adapter weights.
+## License and Intended Use
+Note the use of these adapter weights, requires access to the LLaMA-2 model weighs and therefore should be used according to the LLaMA-2 license.
+## Usage
+Here is an example of how you would load the model 4-bits:
+```python
+import torch
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+model_name = "meta-llama/Llama-2-70b-hf"
+adapters_name = 'uwnlp/llama-2-70b-qlora-openorca'
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    load_in_4bit=True,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    quantization_config=BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_compute_dtype=torch.bfloat16,
+        bnb_4bit_use_double_quant=True,
+        bnb_4bit_quant_type='nf4'
+    ),
+)
+model = PeftModel.from_pretrained(model, adapters_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+```
+Inference can then be performed as usual with HF models as follows:
+```python
+prompt = "Introduce yourself"
+formatted_prompt = (
+    f"A chat between a curious human and an artificial intelligence assistant."
+    f"The assistant gives helpful, detailed, and polite answers to the user's questions.\n"
+    f"### Human: {prompt} ### Assistant:"
+)
+inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda:0")
+outputs = model.generate(inputs=inputs.input_ids, max_new_tokens=20)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+Expected output similar to the following:
+```
+A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
+### Human: Introduce yourself ### Assistant: I am an artificial intelligence assistant. I am here to help you with any questions you may have.
+```
+## Model Card
+**Architecture**: The models released here are LoRA adapters to be used on top of LLaMA-2 models. They are added to all layers. For all model sizes, we use $r=64$.
+**Base Model**: These models use LLaMA-2 as base model. LLaMA is a causal language model pretrained on a large corpus of text. See [LLaMA-2 paper](https://arxiv.org/abs/2307.09288) for more details. Note that these models can inherit biases and limitations of the base model.
+**Finetuning Data**: These models are finetuned on 240k examples of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) dataset.
+**Languages**: The different datasets cover different languages. We direct to the various papers and resources describing the datasets for more details.
+Next, we describe Training and Evaluation details.
+### Training
+QLoRA Instruction Tuned Models are the result of 4-bit QLoRA supervised finetuning on different instruction tuning datasets.
+All models use NormalFloat4 datatype for the base model and LoRA adapters on all linear layers with BFloat16 as computation datatype. We set LoRA $r=64$, $\alpha=16$. We also use Adam beta2 of 0.999, max grad norm of 0.3 and LoRA dropout of 0.1 for models up to 13B and 0.05 for 33B and 65B/70B models.
+For the finetuning process, we use constant learning rate schedule and paged AdamW optimizer.
+### Training hyperparameters
+| Parameters | Dataset  | Batch size | LR   | Steps | Source Length | Target Length |
+|------------|----------|------------|------|-------|---------------|---------------|
+| 7B         | All      | 16         | 2e-4 | 10000 | 384           | 128           |
+| 13B        | All      | 16         | 2e-4 | 10000 | 384           | 128           |
+| 70B        | All      | 64         | 1e-4 | 2500  | 384           | 128           |
+### Evaluation
+We use the MMLU benchmark to measure performance on a range of language understanding tasks. This is a multiple-choice benchmark covering 57 tasks including elementary mathematics, US history, computer science, law, and more. We report 5-shot test accuracy.
+ Dataset | 7B | 13B | 33B | 65B
+---|---|---|---|---
+ LLaMA-1 no tuning | 35.1 | 46.9 | 57.8 | 63.4
+ Self-Instruct | 36.4 | 33.3 | 53.0 | 56.7
+ Longform | 32.1 | 43.2 | 56.6 | 59.7
+ Chip2 | 34.5 | 41.6 | 53.6 | 59.8
+ HH-RLHF | 34.9 | 44.6 | 55.8 | 60.1
+ Unnatural Instruct | 41.9 | 48.1 | 57.3 | 61.3
+ OASST1 (Guanaco) | 36.6 | 46.4 | 57.0 | 62.2
+ Alpaca | 38.8 | 47.8 | 57.3 | 62.5
+ FLAN v2 | 44.5 | 51.4 | 59.2 | 63.9
+Dataset | 7B | 13B | 34B | 70B
+---|---|---|---|---
+ LLaMA-2 no tuning |  45.3 | 54.8 |  62.6 | 68.9
+ OpenOrca | 45.0 |  |  | 69.0
+## Citation
+```bibtex
+@article{dettmers2023qlora,
+  title={QLoRA: Efficient Finetuning of Quantized LLMs},
+  author={Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke},
+  journal={arXiv preprint arXiv:2305.14314},
+  year={2023}
+}
+```

adapter_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "base_model_name_or_path": "meta-llama/Llama-2-70b-hf",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "lora_alpha": 16.0,
+  "lora_dropout": 0.05,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "target_modules": [
+    "v_proj",
+    "k_proj",
+    "down_proj",
+    "o_proj",
+    "q_proj",
+    "up_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f21d5abca6f23a6a2a8c554dd68ed596361ab8c7a2c60f721ed5765f36df9a1d
+size 1657155077