TehVenom
/

MPT-7b-Chat-Instruct-LongCTX-Merge

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

TehVenom commited on May 6, 2023

Commit

68ea8a5

•

1 Parent(s): f7f6f2c

Create README.md

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+<h1 style="text-align: center">MPT-7b-Chat-Instruct-LongCTX-Merge</h1>
+<h2 style="text-align: center">An merge between the long context Storywriting and the short context instruct MPT-7b models.</h2>
+## Model description
+This is a merged model, using a weighted parameter blend strategy at a (20:20:60) ratio between the models:
+- [60%] - 2048 CTX MTP-7b Chat: https://huggingface.co/spaces/mosaicml/mpt-7b-chat
+- [20%] - 2048 CTX MTP-7b Instruct: https://huggingface.co/spaces/mosaicml/mpt-7b-instruct
+- [20%] - 65k CTX MTP-7b Storywriter: https://huggingface.co/mosaicml/mpt-7b-storywriter
+For a final model composed of:
+(MTP-7b Storywriter [200%] + MTP-7b Instruct [50%] ) + MTP-7b Chat [60%]
+----
+This was done under for the sake of testing the theory of how long context tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
+Different from the first merge [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Instruct base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Instruct.
+There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size,
+and to try and make the base Chat model less dry, and slightly more fun / verbose, and intelligent by adding the literature / Instruct based models into it.
+Due to the influence of MPT-7b Storywriter, this model may generate content that is considered NSFW due to the wide array of books sampled for MPT-7b Storywriter.
+The specific prompting is unknown, but try approaching it as a chat bot scenario / prompt.
+Try starting a two line prompt such as:
+```
+Bot: "Hello, how are you?"
+You: "I am doing just fine, thank you."
+```
+Or try giving it instructions such as:
+```
+Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction:
+Explain the concept of artificial intelligence in simple terms.
+### Response:
+Artificial Intelligence (AI) is the ability of machines and computers to make decisions and complete tasks similar to what humans can do.
+AI can learn from data, recognize patterns, and make predictions.
+AI technology can automate tedious and complex processes, helping to reduce human effort and errors, and enable more accurate and efficient decisions.
+AI can be used in a wide range of applications, from robotics and autonomous vehicles to healthcare and finance. It is increasingly becoming an integral part of everyday life.
+```
+Check out the datasets involved in the Chat-7b model to get a better grasp on how to prompt it well:
+-(Anthropic/hh-rlhf)[https://huggingface.co/datasets/Anthropic/hh-rlhf]
+-(tatsu-lab/alpaca)[https://huggingface.co/datasets/tatsu-lab/alpaca]
+-(Hello-SimpleAI/HC3)[https://huggingface.co/datasets/Hello-SimpleAI/HC3]
+Read the original model cards to understand how to run inference on this model.