hywu commited on
Commit
497dcf4
β€’
1 Parent(s): 6a1a68f

update README

Browse files
Files changed (1) hide show
  1. README.md +36 -31
README.md CHANGED
@@ -16,57 +16,62 @@ license: apache-2.0
16
  # Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
17
 
18
  ## News
19
- - 1/10/2024 - Camelidae models are now available on [πŸ€—HuggingFace](https://huggingface.co/hywu).
 
 
20
  - 1/4/2024 - We released the paper, [Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731).
21
- - 12/22/2023 - We released the training [repo](https://github.com/wuhy68/Parameter-Efficient-MoE) that craft the dense model with LLaMA architecture to the MoE model.
22
-
23
  ## Introduction
24
- Camelidae models are trained utilizing Parameter-Efficient Sparsity Crafting techniques
25
 
26
- Parameter-Efficient Sparsity Crafting can help dense models learn knowledge from different fields (including code and math). This appraoch perfrom instruction tuning and utilize MoE structure in an efficient way.
27
 
28
- Specifically, Parameter-Efficient Sparsity Crafting utilizes parameter efficient techiniques including [QLoRA](https://arxiv.org/abs/2305.14314) and [Adapter](https://arxiv.org/abs/1902.00751) to perfrom Efficient [Sparse Upcycling](https://arxiv.org/abs/2212.05055).
29
 
30
  ## Model Lists
31
- | Model | Download
32
  |---|---
33
- Camelidae-8x7B | [πŸ€—HuggingFace](https://huggingface.co/hywu/Camelidae-8x7B)
34
- Camelidae-8x13B | [πŸ€—HuggingFace](https://huggingface.co/hywu/Camelidae-8x13B)
35
- Camelidae-8x34B | [πŸ€—HuggingFace](https://huggingface.co/hywu/Camelidae-8x34B)
 
36
 
37
- ## Performance
38
- | Model | MMLU (5shot) | GSM8k (5shot) | MATH (4shot) | HumanEval (0shot) | MBPP (4shot) | HellaSwag (10shot) | TriviaQA (0shot) |
39
- |----------------------:|:------------:|:-------------:|:------------:|:-----------------:|:------------:|:------------------:|:----------------:|
40
- | GPT3.5 | 70.0% | 57.1% | **34.1%** | **48.1%** | - | 85.5% | - |
41
- | Camelidae-8x34B | 75.6% | **78.3%** | **22.6%** | **43.9%** | **41.4%** | 85.3% | **63.4%** |
42
- | SUSChat-34B | **76.4%** | 72.3% | 22.0% | 11.6% | 40.2% | 83.9% | 56.1% |
43
- | Mixtral-8x7B-instruct | 68.7% | 71.7% | 22.1% | 25.6% | 40.6% | **86.5%** | 57.7% |
44
- | LLaMA2-70B-chat | 63.8% | 59.3% | 10.4% | 32.3% | 35.6% | 84.8% | 63.0% |
45
- | Camelidae-8x13B | 54.4% | 52.6% | 9.8% | 30.6% | 30.4% | 82.5% | 59.4% |
46
- | LLaMA2-13B-chat | 53.9% | 37.1% | 5.2% | 18.9% | 27.2% | 81.9% | 55.0% |
47
- | Camelidae-8x7B | 48.3% | 44.0% | 5.8% | 18.3% | 23.4% | 79.2% | 51.0% |
48
- | LLaMA2-7B-chat | 47.2% | 26.3% | 3.9% | 12.2% | 17.6% | 78.6% | 46.4% |
49
 
50
- We bold the highest scores for open-source models and all models separately.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
 
53
  ## Usage
54
  ```python
55
  from transformers import AutoModelForCausalLM, AutoTokenizer
56
 
57
- # tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x7B", trust_remote_code=True)
58
- # tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x13B", trust_remote_code=True)
59
- tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x34B", trust_remote_code=True)
60
-
61
- # model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x7B", device_map="auto", trust_remote_code=True).eval()
62
- # model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x13B", device_map="auto", trust_remote_code=True).eval()
63
- model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x34B", device_map="auto", trust_remote_code=True).eval()
64
 
65
  inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
66
  inputs = inputs.to(model.device)
67
  pred = model.generate(**inputs)
68
  print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
69
- # I am doing well, thank you.
70
  ```
71
 
72
  ## Citation
 
16
  # Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
17
 
18
  ## News
19
+ - 3/12/2024 - We released Qwen2idae-16x14B-v1.0 on πŸ€— [HuggingFace](https://huggingface.co/hywu/Qwen2idae-16x14B-v1.0), which has strong performance in Math and Code with 15B activated params.
20
+ - 2/7/2024 - [Serp-ai](https://github.com/serp-ai/Parameter-Efficient-MoE) adds [unsloth](https://github.com/serp-ai/unsloth) support for faster and memory efficient training of our Parameter-Efficient Sparsity Crafting and releases new [sparsetral](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) models based on mistral-7B.
21
+ - 1/10/2024 - Camelidae models are now available on πŸ€— [HuggingFace](https://huggingface.co/hywu).
22
  - 1/4/2024 - We released the paper, [Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731).
23
+ - 12/22/2023 - We released the training repo that craft the dense model with LLaMA architecture to the MoE model.
 
24
  ## Introduction
25
+ Camelidae and Qwen2idae models are trained utilizing Parameter-Efficient Sparsity Crafting techniques
26
 
27
+ We present Parameter-Efficient Sparsity Crafting to help dense models learn knowledge from different fields (including code and math). This approach performs instruction tuning and efficiently utilizes MoE structure.
28
 
29
+ Specifically, Parameter-Efficient Sparsity Crafting utilizes parameter-efficient techniques including [QLoRA](https://arxiv.org/abs/2305.14314) and [Adapter](https://arxiv.org/abs/1902.00751) to perform Efficient [Sparse Upcycling](https://arxiv.org/abs/2212.05055).
30
 
31
  ## Model Lists
32
+ | Camelidae Series | Download
33
  |---|---
34
+ Camelidae-8x7B | πŸ€— [HuggingFace](https://huggingface.co/hywu/Camelidae-8x7B)
35
+ Camelidae-8x13B | πŸ€— [HuggingFace](https://huggingface.co/hywu/Camelidae-8x13B)
36
+ Camelidae-8x34B | πŸ€— [HuggingFace](https://huggingface.co/hywu/Camelidae-8x34B)
37
+ Camelidae-8x34B-pro | πŸ€— Coming Soon
38
 
39
+ | Qwen2idae Series | Download
40
+ |---|---
41
+ Qwen2idae-16x14B-v1.0 | πŸ€— [HuggingFace](https://huggingface.co/hywu/Qwen2idae-16x14B-v1.0)
42
+ Qwen2idae-16x7B-v1.0 | πŸ€— Coming Soon
43
+ Qwen2idae-16x1.8B-v1.0 | πŸ€— Coming Soon
 
 
 
 
 
 
 
44
 
45
+ ## Performance
46
+ | Model | Activated Params | MMLU (5shot) | GSM8k (5shot) | MATH (4shot) | HumanEval (0shot) | MBPP (4shot) | HellaSwag (10shot) |
47
+ |:-----:|:----------------:|:------------:|:-------------:|:------------:|:-----------------:|:------------:|:------------------:|
48
+ | GPT3.5 | - | 70.0% | 57.1% | <font color=#F67F70>**34.1%**</font> | <font color=#FBD98D>**48.1%**</font> | - | <font color=#7FEA9E>**85.5%**</font> |
49
+ | LLaMA2-70B-chat | 70B | 63.8% | 59.3% | 10.4% | 32.3% | 35.6% | 84.8% |
50
+ | Camelidae-8x34B-pro | 35B | <font color=#7FEA9E>**75.7%**</font> | <font color=#F67F70>**79.4%**</font> | <font color=#FBD98D>**24.0%**</font> | <font color=#7FEA9E>**48.8%**</font> | <font color=#7FEA9E>**43.2%**</font> | 85.2% |
51
+ | Camelidae-8x34B | 35B | <font color=#FBD98D>**75.6%**</font> | <font color=#7FEA9E>**78.3%**</font> | 22.6% | 43.9% | <font color=#FBD98D>**41.4%**</font> | <font color=#FBD98D>**85.3%**</font> |
52
+ | SUSChat-34B | 34B | <font color=#F67F70>**76.4%**</font> | 72.3% | 22.0% | 11.6% | 40.2% | 83.9% |
53
+ | Yi-34B-chat | 34B | 74.8% | 67.6% | 17.3% | 20.1% | 41.0% | 83.9% |
54
+ | Qwen2idae-16x14B-v1.0 | 15B | 66.7% | <font color=#FBD98D>**77.8%**</font> | <font color=#7FEA9E>**29.9%**</font> | <font color=#F67F70>**62.8%**</font> | <font color=#F67F70>**48.6%**</font> | 82.3% |
55
+ | Mixtral-8x7B-instruct | 14B | 68.7% | 71.7% | 22.1% | 25.6% | 40.6% | <font color=#F67F70>**86.5%**</font> |
56
+ | Camelidae-8x13B | 13B | 54.4% | 52.6% | 9.8% | 30.6% | 30.4% | 82.5% |
57
+ | LLaMA2-13B-chat | 13B | 53.9% | 37.1% | 5.2% | 18.9% | 27.2% | 81.9% |
58
+ | Camelidae-8x7B | 7B | 48.3% | 44.0% | 5.8% | 18.3% | 23.4% | 79.2% |
59
+ | LLaMA2-7B-chat | 7B | 47.2% | 26.3% | 3.9% | 12.2% | 17.6% | 78.6% |
60
+
61
+ We bold the top3 scores separately for all models.
62
 
63
 
64
  ## Usage
65
  ```python
66
  from transformers import AutoModelForCausalLM, AutoTokenizer
67
 
68
+ tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x13B", trust_remote_code=True)
69
+ model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x13B", device_map="auto", trust_remote_code=True).eval()
 
 
 
 
 
70
 
71
  inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
72
  inputs = inputs.to(model.device)
73
  pred = model.generate(**inputs)
74
  print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
 
75
  ```
76
 
77
  ## Citation