Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<h1 style="text-align: center">MPT-7b-Chat-Instruct-LongCTX-Merge</h1>
|
2 |
+
<h2 style="text-align: center">An merge between the long context Storywriting and the short context instruct MPT-7b models.</h2>
|
3 |
+
|
4 |
+
## Model description
|
5 |
+
This is a merged model, using a weighted parameter blend strategy at a (20:20:60) ratio between the models:
|
6 |
+
|
7 |
+
- [60%] - 2048 CTX MTP-7b Chat: https://huggingface.co/spaces/mosaicml/mpt-7b-chat
|
8 |
+
- [20%] - 2048 CTX MTP-7b Instruct: https://huggingface.co/spaces/mosaicml/mpt-7b-instruct
|
9 |
+
- [20%] - 65k CTX MTP-7b Storywriter: https://huggingface.co/mosaicml/mpt-7b-storywriter
|
10 |
+
|
11 |
+
|
12 |
+
For a final model composed of:
|
13 |
+
|
14 |
+
(MTP-7b Storywriter [200%] + MTP-7b Instruct [50%] ) + MTP-7b Chat [60%]
|
15 |
+
|
16 |
+
----
|
17 |
+
|
18 |
+
This was done under for the sake of testing the theory of how long context tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
|
19 |
+
Different from the first merge [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Instruct base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Instruct.
|
20 |
+
|
21 |
+
There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size,
|
22 |
+
and to try and make the base Chat model less dry, and slightly more fun / verbose, and intelligent by adding the literature / Instruct based models into it.
|
23 |
+
|
24 |
+
Due to the influence of MPT-7b Storywriter, this model may generate content that is considered NSFW due to the wide array of books sampled for MPT-7b Storywriter.
|
25 |
+
|
26 |
+
The specific prompting is unknown, but try approaching it as a chat bot scenario / prompt.
|
27 |
+
|
28 |
+
Try starting a two line prompt such as:
|
29 |
+
```
|
30 |
+
Bot: "Hello, how are you?"
|
31 |
+
You: "I am doing just fine, thank you."
|
32 |
+
```
|
33 |
+
|
34 |
+
Or try giving it instructions such as:
|
35 |
+
```
|
36 |
+
Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
37 |
+
|
38 |
+
### Instruction:
|
39 |
+
Explain the concept of artificial intelligence in simple terms.
|
40 |
+
|
41 |
+
### Response:
|
42 |
+
Artificial Intelligence (AI) is the ability of machines and computers to make decisions and complete tasks similar to what humans can do.
|
43 |
+
AI can learn from data, recognize patterns, and make predictions.
|
44 |
+
AI technology can automate tedious and complex processes, helping to reduce human effort and errors, and enable more accurate and efficient decisions.
|
45 |
+
AI can be used in a wide range of applications, from robotics and autonomous vehicles to healthcare and finance. It is increasingly becoming an integral part of everyday life.
|
46 |
+
```
|
47 |
+
|
48 |
+
|
49 |
+
Check out the datasets involved in the Chat-7b model to get a better grasp on how to prompt it well:
|
50 |
+
|
51 |
+
-(Anthropic/hh-rlhf)[https://huggingface.co/datasets/Anthropic/hh-rlhf]
|
52 |
+
-(tatsu-lab/alpaca)[https://huggingface.co/datasets/tatsu-lab/alpaca]
|
53 |
+
-(Hello-SimpleAI/HC3)[https://huggingface.co/datasets/Hello-SimpleAI/HC3]
|
54 |
+
|
55 |
+
Read the original model cards to understand how to run inference on this model.
|