TehVenom commited on
Commit
68ea8a5
1 Parent(s): f7f6f2c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <h1 style="text-align: center">MPT-7b-Chat-Instruct-LongCTX-Merge</h1>
2
+ <h2 style="text-align: center">An merge between the long context Storywriting and the short context instruct MPT-7b models.</h2>
3
+
4
+ ## Model description
5
+ This is a merged model, using a weighted parameter blend strategy at a (20:20:60) ratio between the models:
6
+
7
+ - [60%] - 2048 CTX MTP-7b Chat: https://huggingface.co/spaces/mosaicml/mpt-7b-chat
8
+ - [20%] - 2048 CTX MTP-7b Instruct: https://huggingface.co/spaces/mosaicml/mpt-7b-instruct
9
+ - [20%] - 65k CTX MTP-7b Storywriter: https://huggingface.co/mosaicml/mpt-7b-storywriter
10
+
11
+
12
+ For a final model composed of:
13
+
14
+ (MTP-7b Storywriter [200%] + MTP-7b Instruct [50%] ) + MTP-7b Chat [60%]
15
+
16
+ ----
17
+
18
+ This was done under for the sake of testing the theory of how long context tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
19
+ Different from the first merge [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Instruct base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Instruct.
20
+
21
+ There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size,
22
+ and to try and make the base Chat model less dry, and slightly more fun / verbose, and intelligent by adding the literature / Instruct based models into it.
23
+
24
+ Due to the influence of MPT-7b Storywriter, this model may generate content that is considered NSFW due to the wide array of books sampled for MPT-7b Storywriter.
25
+
26
+ The specific prompting is unknown, but try approaching it as a chat bot scenario / prompt.
27
+
28
+ Try starting a two line prompt such as:
29
+ ```
30
+ Bot: "Hello, how are you?"
31
+ You: "I am doing just fine, thank you."
32
+ ```
33
+
34
+ Or try giving it instructions such as:
35
+ ```
36
+ Below is an instruction that describes a task. Write a response that appropriately completes the request.
37
+
38
+ ### Instruction:
39
+ Explain the concept of artificial intelligence in simple terms.
40
+
41
+ ### Response:
42
+ Artificial Intelligence (AI) is the ability of machines and computers to make decisions and complete tasks similar to what humans can do.
43
+ AI can learn from data, recognize patterns, and make predictions.
44
+ AI technology can automate tedious and complex processes, helping to reduce human effort and errors, and enable more accurate and efficient decisions.
45
+ AI can be used in a wide range of applications, from robotics and autonomous vehicles to healthcare and finance. It is increasingly becoming an integral part of everyday life.
46
+ ```
47
+
48
+
49
+ Check out the datasets involved in the Chat-7b model to get a better grasp on how to prompt it well:
50
+
51
+ -(Anthropic/hh-rlhf)[https://huggingface.co/datasets/Anthropic/hh-rlhf]
52
+ -(tatsu-lab/alpaca)[https://huggingface.co/datasets/tatsu-lab/alpaca]
53
+ -(Hello-SimpleAI/HC3)[https://huggingface.co/datasets/Hello-SimpleAI/HC3]
54
+
55
+ Read the original model cards to understand how to run inference on this model.