munish0838 commited on
Commit
a28e07c
1 Parent(s): 7ce06b2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: H-D-T/Buzz-8b-Large-v0.5
3
+ tags:
4
+ - axolotl
5
+ - Alignment-Lab-AI
6
+ - Meta-Llama-3
7
+ model-index:
8
+ - name: Buzz-8b-Large-0.5
9
+ results: []
10
+ license: apache-2.0
11
+ datasets:
12
+ - H-D-T/Buzz
13
+ language:
14
+ - en
15
+ ---
16
+
17
+ # Buzz-8b-Large-GGUF
18
+
19
+ - This is quantized version of [H-D-T/Buzz-8b-Large-v0.5](https://huggingface.co/H-D-T/Buzz-8b-Large-v0.5) created using llama.cpp
20
+
21
+ ## Model Description
22
+
23
+ - [Alignment Lab AI](https://AlignmentLab.ai) is pleased to introduce our latest research efforts with:
24
+
25
+ **Buzz-8b-Large**, a state-of-the-art language model developed in collaboration with [Hive Digital Technologies](https://hivedt.com/).
26
+
27
+ The Buzz model, Dataset, and Code are to be released to build a toolkit that aims to demonstrate the potential for reuse and optimization of existing pretrained language models to continuously refine the heights of performance that can be achieved with optimal use of FlOps. Alongside Buzz-5b-Medium, we release
28
+
29
+ - [The Buzz Dataset](https://huggingface.co/datasets/H-D-T/Buzz)
30
+ - [Buzz-2.5b-Small](https://huggingface.co/tempbuzz/Lab-AI/Buzz-3b-Small-v0.5)
31
+ - [Buzz-5b-Medium](https://huggingface.co/tempbuzz/Lab-AI/Buzz-5B-Medium-v0.5)
32
+ - [Buzz-8B-Large](https://huggingface.co/tempbuzz/Lab-AI/Buzz-8B-Large-v0.5)
33
+
34
+ we release this, the **Buzz dataset** and over the next few days, two additional models: **Buzz-3B-Small** and **Buzz-5B-Medium**, the codebase to refine, filter and augment the data, as well as prune and train your own variants, will additionally be released in the coming days.
35
+
36
+ ## Iterative Fine-Tuning Methodology
37
+
38
+ Our research builds upon the concepts introduced in several key papers, including:
39
+
40
+ - [Simple and Scalable Strategies to Continually Pre-train Large Language Models](https://arxiv.org/abs/2403.08763)
41
+ - [NEFTune: Noisy Embeddings Improve Instruction Finetuning](https://arxiv.org/abs/2310.05914)
42
+ - [An Optimistic Acceleration of AMSGrad for Nonconvex Optimization](https://arxiv.org/abs/1903.01435)
43
+ - [Improving Generalization Performance by Switching from Adam to SGD](https://arxiv.org/abs/1712.07628)
44
+ - [Orca: Progressive Learning from Complex Explanation Traces of GPT-4](https://arxiv.org/abs/2306.02707v1)
45
+
46
+ By combining high quality data, iterative fine-tuning with carefully selected "grounding" distributions from previous epochs, we have developed a cost-effective approach that pushes the boundaries of model reuse and optimization.
47
+
48
+ ## notably, we observe that the models have not yet appeared to plateu with the application of these techniques
49
+
50
+
51
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6436279eaaef013d1af225c9/wyHyDIJnNmbomonZKQAD0.png)
52
+
53
+ ## Chat Template and Inference
54
+
55
+ To use the Buzz-8b-Medium model for chat-based tasks, you can utilize the provided chat template. Here's an example of how to perform inference using the Hugging Face Transformers library:
56
+ ```python
57
+ from transformers import AutoTokenizer, AutoModelForCausalLM
58
+
59
+ # Load the tokenizer and model
60
+ model_name = "H-D-T/Buzz-8b-Large-v0.5"
61
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
62
+ model = AutoModelForCausalLM.from_pretrained(model_name)
63
+
64
+ # Set the device to run the model on (e.g., "cuda" for GPU, "cpu" for CPU)
65
+ device = "cuda" if torch.cuda.is_available() else "cpu"
66
+ model.to(device)
67
+
68
+ # Define the input prompt
69
+ prompt = "Hello, how are you today?"
70
+
71
+ # Tokenize the input prompt
72
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
73
+
74
+ # Generate the model's response
75
+ output = model.generate(
76
+ input_ids,
77
+ max_length=100,
78
+ num_return_sequences=1,
79
+ no_repeat_ngram_size=2,
80
+ early_stopping=True
81
+ )
82
+
83
+ # Decode the generated response
84
+ response = tokenizer.decode(output[0], skip_special_tokens=True)
85
+
86
+ print("Input:", prompt)
87
+ print("Response:", response)
88
+ ``````
89
+ NOTE: this model is a COMPLETIONS model, it will generate text by default, which completes the text you send it, it only has a *start* <|begin_of_text|> and a *stop* token <|end_of_text|>
90
+ if you want it to have conversations reliably, append <|end_of_text|>\n<|begin_of_text|>assistant: to the end of your prompt, [the speaker 'assistant' is flexible, and can be tooled to the type of response you want, for example "Mathematician:"" will give you a different type of response, than "felon:"]
91
+
92
+ later iterations of the model will likely have formatting similar to *openchat*
93
+ ## Conclusion
94
+
95
+ We intend to focus on *updating* and improving the performance of these models, and surrounding open sourced infrastructure. Our next effort will focus on context and implementing the research currently being conducted by [Wing-Lian](https://github.com/winglian), the lead developer of the [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) training framework that underpins these experiments. We encourage the community to explore Wing-Lian's work, such as the [Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE) and [llama-3-8b-256k-PoSE](https://huggingface.co/winglian/llama-3-8b-256k-PoSE) models, which showcase the potential for further advancements in language modeling.
96
+
97
+
98
+ Buzz hopes to be a proof of concept, and a toolkit to demonstrate and enable the community in the pursuit of efficient and effective locally run, personally owned, language models. Through collaboration with [Hive Digital Technologies](https://hivedigitaltechnologies.com/) who have enabled us to perform this research, we have demonstrated the immense potential for model reuse and optimization.