--- license: cc-by-nc-sa-4.0 datasets: - camel-ai/code - ehartford/wizard_vicuna_70k_unfiltered - anon8231489123/ShareGPT_Vicuna_unfiltered - teknium1/GPTeacher/roleplay-instruct-v2-final - teknium1/GPTeacher/codegen-isntruct - timdettmers/openassistant-guanaco - camel-ai/math - project-baize/baize-chatbot/medical_chat_data - project-baize/baize-chatbot/quora_chat_data - project-baize/baize-chatbot/stackoverflow_chat_data - camel-ai/biology - camel-ai/chemistry - camel-ai/ai_society - jondurbin/airoboros-gpt4-1.2 - LongConversations - camel-ai/physics tags: - Composer - MosaicML - llm-foundry inference: false --- # MPT-7B-Chat-8k License: _CC-By-NC-SA-4.0_ (non-commercial use only) ## Model Date July 18, 2023 ## Model License _CC-By-NC-SA-4.0_ (non-commercial use only) ## Documentation * [Blog post: MPT-7B-8k](https://www.mosaicml.com/blog/long-context-mpt-7b-8k) * [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/) * Questions: Feel free to contact us via the [MosaicML Community Slack](https://mosaicml.me/slack)! ## How to Use You need auto-gptq installed to run the following: `pip install auto-gptq` Example script: ``` from auto_gptq import AutoGPTQForCausalLM from transformers import AutoTokenizer, TextGenerationPipeline, TextStreamer quantized_model = "casperhansen/mpt-7b-8k-chat-gptq" print('loading model...') # load quantized model to the first GPU tokenizer = AutoTokenizer.from_pretrained(quantized_model, use_fast=True) model = AutoGPTQForCausalLM.from_quantized(quantized_model, device="cuda:0", trust_remote_code=True) prompt_format = """<|im_start|>system A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.<|im_end|> <|im_start|>user {text}<|im_end|> <|im_start|>assistant """ prompt = prompt_format.format(text="What is the difference between nuclear fusion and fission?") print('generating...') # or you can also use pipeline streamer = TextStreamer(tokenizer, skip_special_tokens=True) tokens = tokenizer(prompt, return_tensors="pt").to(model.device) output = model.generate(**tokens, max_length=512, streamer=streamer) ```