LoneStriker commited on
Commit
8e5ef2b
1 Parent(s): da7712e

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -1,35 +1,5 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ Buzz-8b-Large-v0.5-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
2
+ Buzz-8b-Large-v0.5-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
3
+ Buzz-8b-Large-v0.5-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
4
+ Buzz-8b-Large-v0.5-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
5
+ Buzz-8b-Large-v0.5-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Buzz-8b-Large-v0.5-Q3_K_L.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45510204b884cf5b10ac6a5e49f4b76dfbb07b180014bff7a1a359b2d4585ea7
3
+ size 4321955744
Buzz-8b-Large-v0.5-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3b887efa72f1aa69c9c57e34985e0a472b6c0f8922dc52b508cfc6279acb775
3
+ size 4920733600
Buzz-8b-Large-v0.5-Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:629b42ff6024f59793abc0cc19a0c3c32401ec6f5147d8256320bbba8aa53d1d
3
+ size 5732986784
Buzz-8b-Large-v0.5-Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7fb3b8119e952f9fd763eb605f9e2159e358cba60edf4d2a84e9bccbe9faf225
3
+ size 6596005792
Buzz-8b-Large-v0.5-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8dec7a92a5a71ed31f4cb18a4b1f3cb71b78cc9de4a56f6311c1554442dcab1c
3
+ size 8540770208
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Alignment-Lab-AI/Neural-network-medium-untuned-theta
3
+ tags:
4
+ - axolotl
5
+ - Alignment-Lab-AI
6
+ - Meta-Llama-3
7
+ model-index:
8
+ - name: Buzz-8b-Large-0.5
9
+ results: []
10
+ license: apache-2.0
11
+ datasets:
12
+ - H-D-T/Buzz
13
+ language:
14
+ - en
15
+ ---
16
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
+
18
+
19
+
20
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6436279eaaef013d1af225c9/fWaQucBWfabfnMsAFN8hv.png)
21
+
22
+ # Buzz-8b-Large: Advancing Efficiency through Iterative Fine-Tuning
23
+
24
+ ## Introduction
25
+
26
+ - [Alignment Lab AI](https://AlignmentLab.ai) is pleased to introduce our latest research efforts with:
27
+
28
+ **Buzz-8b-Large**, a state-of-the-art language model developed in collaboration with [Hive Digital Technologies](https://hivedt.com/).
29
+
30
+ The Buzz model, Dataset, and Code are to be released to build a toolkit that aims to demonstrate the potential for reuse and optimization of existing pretrained language models to continuously refine the heights of performance that can be achieved with optimal use of FlOps. Alongside Buzz-5b-Medium, we release
31
+
32
+ - [The Buzz Dataset](https://huggingface.co/datasets/H-D-T/Buzz)
33
+ - [Buzz-2.5b-Small](https://huggingface.co/tempbuzz/Lab-AI/Buzz-3b-Small-v0.5)
34
+ - [Buzz-5b-Medium](https://huggingface.co/tempbuzz/Lab-AI/Buzz-5B-Medium-v0.5)
35
+ - [Buzz-8B-Large](https://huggingface.co/tempbuzz/Lab-AI/Buzz-8B-Large-v0.5)
36
+
37
+ we release this, the **Buzz dataset** and over the next few days, two additional models: **Buzz-3B-Small** and **Buzz-5B-Medium**, the codebase to refine, filter and augment the data, as well as prune and train your own variants, will additionally be released in the coming days.
38
+
39
+ ## Iterative Fine-Tuning Methodology
40
+
41
+ Our research builds upon the concepts introduced in several key papers, including:
42
+
43
+ - [Simple and Scalable Strategies to Continually Pre-train Large Language Models](https://arxiv.org/abs/2403.08763)
44
+ - [NEFTune: Noisy Embeddings Improve Instruction Finetuning](https://arxiv.org/abs/2310.05914)
45
+ - [An Optimistic Acceleration of AMSGrad for Nonconvex Optimization](https://arxiv.org/abs/1903.01435)
46
+ - [Improving Generalization Performance by Switching from Adam to SGD](https://arxiv.org/abs/1712.07628)
47
+ - [Orca: Progressive Learning from Complex Explanation Traces of GPT-4](https://arxiv.org/abs/2306.02707v1)
48
+
49
+ By combining high quality data, iterative fine-tuning with carefully selected "grounding" distributions from previous epochs, we have developed a cost-effective approach that pushes the boundaries of model reuse and optimization.
50
+
51
+ ## notably, we observe that the models have not yet appeared to plateu with the application of these techniques
52
+
53
+
54
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6436279eaaef013d1af225c9/wyHyDIJnNmbomonZKQAD0.png)
55
+
56
+ ## Chat Template and Inference
57
+
58
+ To use the Buzz-8b-Medium model for chat-based tasks, you can utilize the provided chat template. Here's an example of how to perform inference using the Hugging Face Transformers library:
59
+ ```python
60
+ from transformers import AutoTokenizer, AutoModelForCausalLM
61
+
62
+ # Load the tokenizer and model
63
+ model_name = "H-D-T/Buzz-8b-Large-v0.5"
64
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
65
+ model = AutoModelForCausalLM.from_pretrained(model_name)
66
+
67
+ # Set the device to run the model on (e.g., "cuda" for GPU, "cpu" for CPU)
68
+ device = "cuda" if torch.cuda.is_available() else "cpu"
69
+ model.to(device)
70
+
71
+ # Define the input prompt
72
+ prompt = "Hello, how are you today?"
73
+
74
+ # Tokenize the input prompt
75
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
76
+
77
+ # Generate the model's response
78
+ output = model.generate(
79
+ input_ids,
80
+ max_length=100,
81
+ num_return_sequences=1,
82
+ no_repeat_ngram_size=2,
83
+ early_stopping=True
84
+ )
85
+
86
+ # Decode the generated response
87
+ response = tokenizer.decode(output[0], skip_special_tokens=True)
88
+
89
+ print("Input:", prompt)
90
+ print("Response:", response)
91
+ ``````
92
+ NOTE: this model is a COMPLETIONS model, it will generate text by default, which completes the text you send it, it only has a *start* <|begin_of_text|> and a *stop* token <|end_of_text|>
93
+ if you want it to have conversations reliably, append <|end_of_text|>\n<|begin_of_text|>assistant: to the end of your prompt, [the speaker 'assistant' is flexible, and can be tooled to the type of response you want, for example "Mathematician:"" will give you a different type of response, than "felon:"]
94
+
95
+ later iterations of the model will likely have formatting similar to *openchat*
96
+ ## Conclusion
97
+
98
+ We intend to focus on *updating* and improving the performance of these models, and surrounding open sourced infrastructure. Our next effort will focus on context and implementing the research currently being conducted by [Wing-Lian](https://github.com/winglian), the lead developer of the [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) training framework that underpins these experiments. We encourage the community to explore Wing-Lian's work, such as the [Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE) and [llama-3-8b-256k-PoSE](https://huggingface.co/winglian/llama-3-8b-256k-PoSE) models, which showcase the potential for further advancements in language modeling.
99
+
100
+
101
+ Buzz hopes to be a proof of concept, and a toolkit to demonstrate and enable the community in the pursuit of efficient and effective locally run, personally owned, language models. Through collaboration with [Hive Digital Technologies](https://hivedigitaltechnologies.com/) who have enabled us to perform this research, we have demonstrated the immense potential for model reuse and optimization.
102
+
103
+ ## Credits
104
+ to the many researchers who have open sourced their knowledge and tools to allow us to pursue this,
105
+
106
+ to [Hive Digital Technologies](https://hivedigitaltechnologies.com/) for providing compute, advice, and meaningful research insight.
107
+
108
+ to [Meta](https://llama.meta.com) for developing the Llama models, and maintaining a philosophy of supporting open research and open source.
109
+
110
+ To wing et al. with [Open Access AI Collective](https://github.com/OpenAccess-AI-Collective) for developing [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl), assisting with research, and generally being geniuses.
111
+
112
+ to [Thomas Capelle](https://wandb.ai/capecape) et al. working on [LLM_Surgery](https://wandb.ai/llm_surgery)
113
+
114
+
115
+ as well as many, many others who are too numerous to name.
116
+
117
+
118
+
119
+ # Citations
120
+ ```
121
+ @misc{ibrahim2024simple,
122
+ title={Simple and Scalable Strategies to Continually Pre-train Large Language Models},
123
+ author={Adam Ibrahim and Benjamin Thérien and Kshitij Gupta and Mats L. Richter and Quentin Anthony and Timothée Lesort and Eugene Belilovsky and Irina Rish},
124
+ year={2024},
125
+ eprint={2403.08763},
126
+ archivePrefix={arXiv},
127
+ primaryClass={cs.LG}
128
+ }
129
+
130
+ @misc{jain2023neftune,
131
+ title={NEFTune: Noisy Embeddings Improve Instruction Finetuning},
132
+ author={Neel Jain and Ping-yeh Chiang and Yuxin Wen and John Kirchenbauer and Hong-Min Chu and Gowthami Somepalli and Brian R. Bartoldson and Bhavya Kailkhura and Avi Schwarzschild and Aniruddha Saha and Micah Goldblum and Jonas Geiping and Tom Goldstein},
133
+ year={2023},
134
+ eprint={2310.05914},
135
+ archivePrefix={arXiv},
136
+ primaryClass={cs.CL}
137
+ }
138
+
139
+ @misc{wang2020optimistic,
140
+ title={An Optimistic Acceleration of AMSGrad for Nonconvex Optimization},
141
+ author={Jun-Kun Wang and Xiaoyun Li and Belhal Karimi and Ping Li},
142
+ year={2020},
143
+ eprint={1903.01435},
144
+ archivePrefix={arXiv},
145
+ primaryClass={stat.ML}
146
+ }
147
+
148
+ @misc{keskar2017improving,
149
+ title={Improving Generalization Performance by Switching from Adam to SGD},
150
+ author={Nitish Shirish Keskar and Richard Socher},
151
+ year={2017},
152
+ eprint={1712.07628},
153
+ archivePrefix={arXiv},
154
+ primaryClass={cs.LG}
155
+ }
156
+
157
+ @misc{mukherjee2023orca,
158
+ title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4},
159
+ author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
160
+ year={2023},
161
+ eprint={2306.02707},
162
+ archivePrefix={arXiv},
163
+ primaryClass={cs.CL}
164
+ }
165
+ ```