update README
Browse files
README.md
CHANGED
@@ -16,57 +16,62 @@ license: apache-2.0
|
|
16 |
# Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
|
17 |
|
18 |
## News
|
19 |
-
-
|
|
|
|
|
20 |
- 1/4/2024 - We released the paper, [Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731).
|
21 |
-
- 12/22/2023 - We released the training
|
22 |
-
|
23 |
## Introduction
|
24 |
-
Camelidae models are trained utilizing Parameter-Efficient Sparsity Crafting techniques
|
25 |
|
26 |
-
Parameter-Efficient Sparsity Crafting
|
27 |
|
28 |
-
Specifically, Parameter-Efficient Sparsity Crafting utilizes parameter
|
29 |
|
30 |
## Model Lists
|
31 |
-
|
|
32 |
|---|---
|
33 |
-
Camelidae-8x7B | [
|
34 |
-
Camelidae-8x13B | [
|
35 |
-
Camelidae-8x34B | [
|
|
|
36 |
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
| SUSChat-34B | **76.4%** | 72.3% | 22.0% | 11.6% | 40.2% | 83.9% | 56.1% |
|
43 |
-
| Mixtral-8x7B-instruct | 68.7% | 71.7% | 22.1% | 25.6% | 40.6% | **86.5%** | 57.7% |
|
44 |
-
| LLaMA2-70B-chat | 63.8% | 59.3% | 10.4% | 32.3% | 35.6% | 84.8% | 63.0% |
|
45 |
-
| Camelidae-8x13B | 54.4% | 52.6% | 9.8% | 30.6% | 30.4% | 82.5% | 59.4% |
|
46 |
-
| LLaMA2-13B-chat | 53.9% | 37.1% | 5.2% | 18.9% | 27.2% | 81.9% | 55.0% |
|
47 |
-
| Camelidae-8x7B | 48.3% | 44.0% | 5.8% | 18.3% | 23.4% | 79.2% | 51.0% |
|
48 |
-
| LLaMA2-7B-chat | 47.2% | 26.3% | 3.9% | 12.2% | 17.6% | 78.6% | 46.4% |
|
49 |
|
50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
|
53 |
## Usage
|
54 |
```python
|
55 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
56 |
|
57 |
-
|
58 |
-
|
59 |
-
tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x34B", trust_remote_code=True)
|
60 |
-
|
61 |
-
# model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x7B", device_map="auto", trust_remote_code=True).eval()
|
62 |
-
# model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x13B", device_map="auto", trust_remote_code=True).eval()
|
63 |
-
model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x34B", device_map="auto", trust_remote_code=True).eval()
|
64 |
|
65 |
inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
|
66 |
inputs = inputs.to(model.device)
|
67 |
pred = model.generate(**inputs)
|
68 |
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
69 |
-
# I am doing well, thank you.
|
70 |
```
|
71 |
|
72 |
## Citation
|
|
|
16 |
# Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
|
17 |
|
18 |
## News
|
19 |
+
- 3/12/2024 - We released Qwen2idae-16x14B-v1.0 on π€ [HuggingFace](https://huggingface.co/hywu/Qwen2idae-16x14B-v1.0), which has strong performance in Math and Code with 15B activated params.
|
20 |
+
- 2/7/2024 - [Serp-ai](https://github.com/serp-ai/Parameter-Efficient-MoE) adds [unsloth](https://github.com/serp-ai/unsloth) support for faster and memory efficient training of our Parameter-Efficient Sparsity Crafting and releases new [sparsetral](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) models based on mistral-7B.
|
21 |
+
- 1/10/2024 - Camelidae models are now available on π€ [HuggingFace](https://huggingface.co/hywu).
|
22 |
- 1/4/2024 - We released the paper, [Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731).
|
23 |
+
- 12/22/2023 - We released the training repo that craft the dense model with LLaMA architecture to the MoE model.
|
|
|
24 |
## Introduction
|
25 |
+
Camelidae and Qwen2idae models are trained utilizing Parameter-Efficient Sparsity Crafting techniques
|
26 |
|
27 |
+
We present Parameter-Efficient Sparsity Crafting to help dense models learn knowledge from different fields (including code and math). This approach performs instruction tuning and efficiently utilizes MoE structure.
|
28 |
|
29 |
+
Specifically, Parameter-Efficient Sparsity Crafting utilizes parameter-efficient techniques including [QLoRA](https://arxiv.org/abs/2305.14314) and [Adapter](https://arxiv.org/abs/1902.00751) to perform Efficient [Sparse Upcycling](https://arxiv.org/abs/2212.05055).
|
30 |
|
31 |
## Model Lists
|
32 |
+
| Camelidae Series | Download
|
33 |
|---|---
|
34 |
+
Camelidae-8x7B | π€ [HuggingFace](https://huggingface.co/hywu/Camelidae-8x7B)
|
35 |
+
Camelidae-8x13B | π€ [HuggingFace](https://huggingface.co/hywu/Camelidae-8x13B)
|
36 |
+
Camelidae-8x34B | π€ [HuggingFace](https://huggingface.co/hywu/Camelidae-8x34B)
|
37 |
+
Camelidae-8x34B-pro | π€ Coming Soon
|
38 |
|
39 |
+
| Qwen2idae Series | Download
|
40 |
+
|---|---
|
41 |
+
Qwen2idae-16x14B-v1.0 | π€ [HuggingFace](https://huggingface.co/hywu/Qwen2idae-16x14B-v1.0)
|
42 |
+
Qwen2idae-16x7B-v1.0 | π€ Coming Soon
|
43 |
+
Qwen2idae-16x1.8B-v1.0 | π€ Coming Soon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
+
## Performance
|
46 |
+
| Model | Activated Params | MMLU (5shot) | GSM8k (5shot) | MATH (4shot) | HumanEval (0shot) | MBPP (4shot) | HellaSwag (10shot) |
|
47 |
+
|:-----:|:----------------:|:------------:|:-------------:|:------------:|:-----------------:|:------------:|:------------------:|
|
48 |
+
| GPT3.5 | - | 70.0% | 57.1% | <font color=#F67F70>**34.1%**</font> | <font color=#FBD98D>**48.1%**</font> | - | <font color=#7FEA9E>**85.5%**</font> |
|
49 |
+
| LLaMA2-70B-chat | 70B | 63.8% | 59.3% | 10.4% | 32.3% | 35.6% | 84.8% |
|
50 |
+
| Camelidae-8x34B-pro | 35B | <font color=#7FEA9E>**75.7%**</font> | <font color=#F67F70>**79.4%**</font> | <font color=#FBD98D>**24.0%**</font> | <font color=#7FEA9E>**48.8%**</font> | <font color=#7FEA9E>**43.2%**</font> | 85.2% |
|
51 |
+
| Camelidae-8x34B | 35B | <font color=#FBD98D>**75.6%**</font> | <font color=#7FEA9E>**78.3%**</font> | 22.6% | 43.9% | <font color=#FBD98D>**41.4%**</font> | <font color=#FBD98D>**85.3%**</font> |
|
52 |
+
| SUSChat-34B | 34B | <font color=#F67F70>**76.4%**</font> | 72.3% | 22.0% | 11.6% | 40.2% | 83.9% |
|
53 |
+
| Yi-34B-chat | 34B | 74.8% | 67.6% | 17.3% | 20.1% | 41.0% | 83.9% |
|
54 |
+
| Qwen2idae-16x14B-v1.0 | 15B | 66.7% | <font color=#FBD98D>**77.8%**</font> | <font color=#7FEA9E>**29.9%**</font> | <font color=#F67F70>**62.8%**</font> | <font color=#F67F70>**48.6%**</font> | 82.3% |
|
55 |
+
| Mixtral-8x7B-instruct | 14B | 68.7% | 71.7% | 22.1% | 25.6% | 40.6% | <font color=#F67F70>**86.5%**</font> |
|
56 |
+
| Camelidae-8x13B | 13B | 54.4% | 52.6% | 9.8% | 30.6% | 30.4% | 82.5% |
|
57 |
+
| LLaMA2-13B-chat | 13B | 53.9% | 37.1% | 5.2% | 18.9% | 27.2% | 81.9% |
|
58 |
+
| Camelidae-8x7B | 7B | 48.3% | 44.0% | 5.8% | 18.3% | 23.4% | 79.2% |
|
59 |
+
| LLaMA2-7B-chat | 7B | 47.2% | 26.3% | 3.9% | 12.2% | 17.6% | 78.6% |
|
60 |
+
|
61 |
+
We bold the top3 scores separately for all models.
|
62 |
|
63 |
|
64 |
## Usage
|
65 |
```python
|
66 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
67 |
|
68 |
+
tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x13B", trust_remote_code=True)
|
69 |
+
model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x13B", device_map="auto", trust_remote_code=True).eval()
|
|
|
|
|
|
|
|
|
|
|
70 |
|
71 |
inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
|
72 |
inputs = inputs.to(model.device)
|
73 |
pred = model.generate(**inputs)
|
74 |
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
|
|
75 |
```
|
76 |
|
77 |
## Citation
|