README.md · LeroyDyer/SpydazWeb_AI_ImageText_Text_Project at 8d0dd9420eb21d074cc520d3ee9629891f0e7db0

File size: 5,774 Bytes

fcf111b

---
license: mit
language:
- en
library_name: transformers
tags:
- art
- medical
- biology
- code
- chemistry
metrics:
- code_eval
- chrf
- charcut_mt
- cer
- brier_score
- bleurt
- bertscore
- accuracy
pipeline_tag: image-text-to-text
---

# MULTI-MODAL-MODEL
## LeroyDyer/Mixtral_AI_Vision-Instruct_X




currently in test mode


# Vision/multimodal capabilities:

 If you want to use vision functionality:

 * You must use the latest versions of [Koboldcpp](https://github.com/LostRuins/koboldcpp).
 
To use the multimodal capabilities of this model and use **vision** you need to load the specified **mmproj** file, this can be found inside this model repo. ([LeroyDyer/Mixtral_AI_Vision-Instruct_X](https://huggingface.co/LeroyDyer/Mixtral_AI_Vision-Instruct_X))
 
 * You can load the **mmproj** by using the corresponding section in the interface:

 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/UX6Ubss2EPNAT3SKGMLe0.png)

## Vision/multimodal capabilities:

* For loading 4-bit use 4-bit mmproj file.- mmproj-Mixtral_AI_Vision-Instruct_X-Q4_0

* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-Q8_0

* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-f16



## Extended capabilities:

```
  * mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base

  * ChaoticNeutrals/Eris-LelantaclesV2-7b - role play
 
  * ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision

  * rvv-karma/BASH-Coder-Mistral-7B - coding

  * Locutusque/Hercules-3.1-Mistral-7B - Unhinging

  * KoboldAI/Mistral-7B-Erebus-v3 - NSFW

  * Locutusque/Hyperion-2.1-Mistral-7B - CHAT

  * Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking

  * NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing
 
  * mistralai/Mistral-7B-Instruct-v0.2 - BASE

  * Nitral-AI/ProdigyXBioMistral_7B - medical

  * Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement

  * Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion
 
  * yanismiraoui/Yarn-Mistral-7b-128k-sharded

  * ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay

```

# "image-text-text"


## using transformers

``` python
from transformers import AutoProcessor, LlavaForConditionalGeneration
from transformers import BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16
)


model_id = "LeroyDyer/Mixtral_AI_Vision-Instruct_X"

processor = AutoProcessor.from_pretrained(model_id)
model = LlavaForConditionalGeneration.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto")


import requests
from PIL import Image

image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw)
image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
display(image1)
display(image2)

prompts = [
            "USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me?\nASSISTANT:",
            "USER: <image>\nPlease describe this image\nASSISTANT:",
]

inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt").to("cuda")
for k,v in inputs.items():
  print(k,v.shape)

```

## Using pipeline

``` python

from transformers import pipeline
from PIL import Image    
import requests

model_id = LeroyDyer/Mixtral_AI_Vision-Instruct_X
pipe = pipeline("image-to-text", model=model_id)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"

image = Image.open(requests.get(url, stream=True).raw)
question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
prompt = f"A chat between a curious human and an artificial intelligence assistant.
            The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:"

outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
print(outputs)
```



  


## Mistral ChatTemplating
Instruction format
In order to leverage instruction fine-tuning, 
your prompt should be surrounded by [INST] and [/INST] tokens.
The very first instruction should begin with a begin of sentence id. The next instructions should not. 
The assistant generation will be ended by the end-of-sentence token id.



```python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")

chat = [
   {"role": "user", "content": "Hello, how are you?"},
   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
   {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

tokenizer.apply_chat_template(chat, tokenize=False)

```

# TextToText

``` python
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```