File size: 5,774 Bytes
fcf111b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
---
license: mit
language:
- en
library_name: transformers
tags:
- art
- medical
- biology
- code
- chemistry
metrics:
- code_eval
- chrf
- charcut_mt
- cer
- brier_score
- bleurt
- bertscore
- accuracy
pipeline_tag: image-text-to-text
---
# MULTI-MODAL-MODEL
## LeroyDyer/Mixtral_AI_Vision-Instruct_X
currently in test mode
# Vision/multimodal capabilities:
If you want to use vision functionality:
* You must use the latest versions of [Koboldcpp](https://github.com/LostRuins/koboldcpp).
To use the multimodal capabilities of this model and use **vision** you need to load the specified **mmproj** file, this can be found inside this model repo. ([LeroyDyer/Mixtral_AI_Vision-Instruct_X](https://huggingface.co/LeroyDyer/Mixtral_AI_Vision-Instruct_X))
* You can load the **mmproj** by using the corresponding section in the interface:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/UX6Ubss2EPNAT3SKGMLe0.png)
## Vision/multimodal capabilities:
* For loading 4-bit use 4-bit mmproj file.- mmproj-Mixtral_AI_Vision-Instruct_X-Q4_0
* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-Q8_0
* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-f16
## Extended capabilities:
```
* mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base
* ChaoticNeutrals/Eris-LelantaclesV2-7b - role play
* ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision
* rvv-karma/BASH-Coder-Mistral-7B - coding
* Locutusque/Hercules-3.1-Mistral-7B - Unhinging
* KoboldAI/Mistral-7B-Erebus-v3 - NSFW
* Locutusque/Hyperion-2.1-Mistral-7B - CHAT
* Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking
* NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing
* mistralai/Mistral-7B-Instruct-v0.2 - BASE
* Nitral-AI/ProdigyXBioMistral_7B - medical
* Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement
* Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion
* yanismiraoui/Yarn-Mistral-7b-128k-sharded
* ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay
```
# "image-text-text"
## using transformers
``` python
from transformers import AutoProcessor, LlavaForConditionalGeneration
from transformers import BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)
model_id = "LeroyDyer/Mixtral_AI_Vision-Instruct_X"
processor = AutoProcessor.from_pretrained(model_id)
model = LlavaForConditionalGeneration.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto")
import requests
from PIL import Image
image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw)
image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
display(image1)
display(image2)
prompts = [
"USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me?\nASSISTANT:",
"USER: <image>\nPlease describe this image\nASSISTANT:",
]
inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt").to("cuda")
for k,v in inputs.items():
print(k,v.shape)
```
## Using pipeline
``` python
from transformers import pipeline
from PIL import Image
import requests
model_id = LeroyDyer/Mixtral_AI_Vision-Instruct_X
pipe = pipeline("image-to-text", model=model_id)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
image = Image.open(requests.get(url, stream=True).raw)
question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
prompt = f"A chat between a curious human and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:"
outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
print(outputs)
```
## Mistral ChatTemplating
Instruction format
In order to leverage instruction fine-tuning,
your prompt should be surrounded by [INST] and [/INST] tokens.
The very first instruction should begin with a begin of sentence id. The next instructions should not.
The assistant generation will be ended by the end-of-sentence token id.
```python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
chat = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]
tokenizer.apply_chat_template(chat, tokenize=False)
```
# TextToText
``` python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```
|