README.md · LeroyDyer/SpydazWeb_AI_ImageText_Text

SpydazWeb_AI_ImageText_Text_Project / README.md

LeroyDyer

Upload folder using huggingface_hub

fcf111b verified 8 months ago

preview code

raw

history blame contribute delete

5.77 kB

	---
	license: mit
	language:
	- en
	library_name: transformers
	tags:
	- art
	- medical
	- biology
	- code
	- chemistry
	metrics:
	- code_eval
	- chrf
	- charcut_mt
	- cer
	- brier_score
	- bleurt
	- bertscore
	- accuracy
	pipeline_tag: image-text-to-text
	---

	# MULTI-MODAL-MODEL
	## LeroyDyer/Mixtral_AI_Vision-Instruct_X




	currently in test mode


	# Vision/multimodal capabilities:

	If you want to use vision functionality:

	* You must use the latest versions of [Koboldcpp](https://github.com/LostRuins/koboldcpp).

	To use the multimodal capabilities of this model and use vision you need to load the specified mmproj file, this can be found inside this model repo. ([LeroyDyer/Mixtral_AI_Vision-Instruct_X](https://huggingface.co/LeroyDyer/Mixtral_AI_Vision-Instruct_X))

	* You can load the mmproj by using the corresponding section in the interface:

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/UX6Ubss2EPNAT3SKGMLe0.png)

	## Vision/multimodal capabilities:

	* For loading 4-bit use 4-bit mmproj file.- mmproj-Mixtral_AI_Vision-Instruct_X-Q4_0

	* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-Q8_0

	* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-f16



	## Extended capabilities:

	```
	* mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base

	* ChaoticNeutrals/Eris-LelantaclesV2-7b - role play

	* ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision

	* rvv-karma/BASH-Coder-Mistral-7B - coding

	* Locutusque/Hercules-3.1-Mistral-7B - Unhinging

	* KoboldAI/Mistral-7B-Erebus-v3 - NSFW

	* Locutusque/Hyperion-2.1-Mistral-7B - CHAT

	* Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking

	* NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing

	* mistralai/Mistral-7B-Instruct-v0.2 - BASE

	* Nitral-AI/ProdigyXBioMistral_7B - medical

	* Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement

	* Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion

	* yanismiraoui/Yarn-Mistral-7b-128k-sharded

	* ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay

	```

	# "image-text-text"


	## using transformers

	``` python
	from transformers import AutoProcessor, LlavaForConditionalGeneration
	from transformers import BitsAndBytesConfig
	import torch

	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.float16
	)


	model_id = "LeroyDyer/Mixtral_AI_Vision-Instruct_X"

	processor = AutoProcessor.from_pretrained(model_id)
	model = LlavaForConditionalGeneration.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto")


	import requests
	from PIL import Image

	image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw)
	image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
	display(image1)
	display(image2)

	prompts = [
	"USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me?\nASSISTANT:",
	"USER: <image>\nPlease describe this image\nASSISTANT:",
	]

	inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt").to("cuda")
	for k,v in inputs.items():
	print(k,v.shape)

	```

	## Using pipeline

	``` python

	from transformers import pipeline
	from PIL import Image
	import requests

	model_id = LeroyDyer/Mixtral_AI_Vision-Instruct_X
	pipe = pipeline("image-to-text", model=model_id)
	url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"

	image = Image.open(requests.get(url, stream=True).raw)
	question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
	prompt = f"A chat between a curious human and an artificial intelligence assistant.
	The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:"

	outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
	print(outputs)
	```






	## Mistral ChatTemplating
	Instruction format
	In order to leverage instruction fine-tuning,
	your prompt should be surrounded by [INST] and [/INST] tokens.
	The very first instruction should begin with a begin of sentence id. The next instructions should not.
	The assistant generation will be ended by the end-of-sentence token id.



	```python
	from transformers import AutoTokenizer
	tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")

	chat = [
	{"role": "user", "content": "Hello, how are you?"},
	{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
	{"role": "user", "content": "I'd like to show off how chat templating works!"},
	]

	tokenizer.apply_chat_template(chat, tokenize=False)

	```

	# TextToText

	``` python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	device = "cuda" # the device to load the model onto

	model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
	tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")

	messages = [
	{"role": "user", "content": "What is your favourite condiment?"},
	{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
	{"role": "user", "content": "Do you have mayonnaise recipes?"}
	]

	encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

	model_inputs = encodeds.to(device)
	model.to(device)

	generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
	decoded = tokenizer.batch_decode(generated_ids)
	print(decoded[0])
	```