TehVenom
/

MPT-7b-Chat-Instruct-LongCTX-Merge

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

MPT-7b-Chat-Instruct-LongCTX-Merge / README.md

TehVenom's picture

Update README.md

e5e3578 over 1 year ago

|

history blame contribute delete

3.11 kB

	<h1 style="text-align: center">MPT-7b-Chat-Instruct-LongCTX-Merge</h1>
	<h2 style="text-align: center">A merge between the long context Storywriting and the short context Chat oriented MPT-7b models.</h2>

	## Model description
	This is a merged model, using a weighted parameter blend strategy at a (20:20:60) ratio between the models:

	- [60%] - 2048 CTX MTP-7b Chat: https://huggingface.co/TehVenom/MPT-7b-chat-V
	- [20%] - 2048 CTX MTP-7b Instruct: https://huggingface.co/TehVenom/MPT-7b-instruct-V
	- [20%] - 65k CTX MTP-7b Storywriter: https://huggingface.co/TehVenom/MPT-7b-storywriter-Apache-2.0/


	For a final model composed of:

	(MTP-7b Storywriter [20%] + MTP-7b Instruct [20%] ) + MTP-7b Chat [60%]

	----

	This was done for the sake of testing the theory of how 'long context' tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
	Different from the first merges [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Chat base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Chatting.

	There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size,
	and to try and make the base Chat model less dry, and slightly more fun / verbose, and intelligent by adding the literature / Instruct based models into it.

	Due to the influence of MPT-7b Storywriter, this model may generate content that is considered NSFW due to the wide array of books sampled for MPT-7b Storywriter.

	The specific prompting is unknown, but try approaching it as a chat bot scenario / prompt.

	Try starting a two line prompt such as:
	```
	Bot: "Hello, how are you?"
	You: "I am doing just fine, thank you."
	```

	Or try giving it instructions such as:
	```
	Below is an instruction that describes a task. Write a response that appropriately completes the request.

	### Instruction:
	Explain the concept of artificial intelligence in simple terms.

	### Response:
	Artificial Intelligence (AI) is the ability of machines and computers to make decisions and complete tasks similar to what humans can do.
	AI can learn from data, recognize patterns, and make predictions.
	AI technology can automate tedious and complex processes, helping to reduce human effort and errors, and enable more accurate and efficient decisions.
	AI can be used in a wide range of applications, from robotics and autonomous vehicles to healthcare and finance. It is increasingly becoming an integral part of everyday life.
	```


	Check out the datasets involved in the Chat-7b model to get a better grasp on how to prompt it well:

	-(Anthropic/hh-rlhf)[https://huggingface.co/datasets/Anthropic/hh-rlhf]
	-(tatsu-lab/alpaca)[https://huggingface.co/datasets/tatsu-lab/alpaca]
	-(Hello-SimpleAI/HC3)[https://huggingface.co/datasets/Hello-SimpleAI/HC3]

	Read the original model cards to understand how to run inference on this model.