File size: 2,169 Bytes
081aec9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
license: cc-by-nc-sa-4.0
datasets:
- camel-ai/code
- ehartford/wizard_vicuna_70k_unfiltered
- anon8231489123/ShareGPT_Vicuna_unfiltered
- teknium1/GPTeacher/roleplay-instruct-v2-final
- teknium1/GPTeacher/codegen-isntruct
- timdettmers/openassistant-guanaco
- camel-ai/math
- project-baize/baize-chatbot/medical_chat_data
- project-baize/baize-chatbot/quora_chat_data
- project-baize/baize-chatbot/stackoverflow_chat_data
- camel-ai/biology
- camel-ai/chemistry
- camel-ai/ai_society
- jondurbin/airoboros-gpt4-1.2
- LongConversations
- camel-ai/physics
tags:
- Composer
- MosaicML
- llm-foundry
inference: false
---
# MPT-7B-Chat-8k
License: _CC-By-NC-SA-4.0_ (non-commercial use only)
## Model Date
July 18, 2023
## Model License
_CC-By-NC-SA-4.0_ (non-commercial use only)
## Documentation
* [Blog post: MPT-7B-8k](https://www.mosaicml.com/blog/long-context-mpt-7b-8k)
* [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
* Questions: Feel free to contact us via the [MosaicML Community Slack](https://mosaicml.me/slack)!
## How to Use
You need auto-gptq installed to run the following: `pip install auto-gptq`
Example script:
```
from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer, TextGenerationPipeline, TextStreamer
quantized_model = "casperhansen/mpt-7b-8k-chat-gptq"
print('loading model...')
# load quantized model to the first GPU
tokenizer = AutoTokenizer.from_pretrained(quantized_model, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(quantized_model, device="cuda:0", trust_remote_code=True)
prompt_format = """<|im_start|>system
A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.<|im_end|>
<|im_start|>user
{text}<|im_end|>
<|im_start|>assistant
"""
prompt = prompt_format.format(text="What is the difference between nuclear fusion and fission?")
print('generating...')
# or you can also use pipeline
streamer = TextStreamer(tokenizer, skip_special_tokens=True)
tokens = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**tokens, max_length=512, streamer=streamer)
``` |