|
<h1 style="text-align: center">MPT-7b-Chat-Instruct-LongCTX-Merge</h1> |
|
<h2 style="text-align: center">A merge between the long context Storywriting and the short context Chat oriented MPT-7b models.</h2> |
|
|
|
## Model description |
|
This is a merged model, using a weighted parameter blend strategy at a (20:20:60) ratio between the models: |
|
|
|
- [60%] - 2048 CTX MTP-7b Chat: https://huggingface.co/TehVenom/MPT-7b-chat-V |
|
- [20%] - 2048 CTX MTP-7b Instruct: https://huggingface.co/TehVenom/MPT-7b-instruct-V |
|
- [20%] - 65k CTX MTP-7b Storywriter: https://huggingface.co/TehVenom/MPT-7b-storywriter-Apache-2.0/ |
|
|
|
|
|
For a final model composed of: |
|
|
|
(MTP-7b Storywriter [200%] + MTP-7b Instruct [50%] ) + MTP-7b Chat [60%] |
|
|
|
---- |
|
|
|
This was done under for the sake of testing the theory of how long context tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span. |
|
Different from the first merge [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Instruct base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Instruct. |
|
|
|
There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size, |
|
and to try and make the base Chat model less dry, and slightly more fun / verbose, and intelligent by adding the literature / Instruct based models into it. |
|
|
|
Due to the influence of MPT-7b Storywriter, this model may generate content that is considered NSFW due to the wide array of books sampled for MPT-7b Storywriter. |
|
|
|
The specific prompting is unknown, but try approaching it as a chat bot scenario / prompt. |
|
|
|
Try starting a two line prompt such as: |
|
``` |
|
Bot: "Hello, how are you?" |
|
You: "I am doing just fine, thank you." |
|
``` |
|
|
|
Or try giving it instructions such as: |
|
``` |
|
Below is an instruction that describes a task. Write a response that appropriately completes the request. |
|
|
|
### Instruction: |
|
Explain the concept of artificial intelligence in simple terms. |
|
|
|
### Response: |
|
Artificial Intelligence (AI) is the ability of machines and computers to make decisions and complete tasks similar to what humans can do. |
|
AI can learn from data, recognize patterns, and make predictions. |
|
AI technology can automate tedious and complex processes, helping to reduce human effort and errors, and enable more accurate and efficient decisions. |
|
AI can be used in a wide range of applications, from robotics and autonomous vehicles to healthcare and finance. It is increasingly becoming an integral part of everyday life. |
|
``` |
|
|
|
|
|
Check out the datasets involved in the Chat-7b model to get a better grasp on how to prompt it well: |
|
|
|
-(Anthropic/hh-rlhf)[https://huggingface.co/datasets/Anthropic/hh-rlhf] |
|
-(tatsu-lab/alpaca)[https://huggingface.co/datasets/tatsu-lab/alpaca] |
|
-(Hello-SimpleAI/HC3)[https://huggingface.co/datasets/Hello-SimpleAI/HC3] |
|
|
|
Read the original model cards to understand how to run inference on this model. |