Support for multiple images..

#19
by wamozart - opened

I'm trying to pass multiple images in the prompt and ask the model to find the differences between these two models.
image1 = Image.open(requests.get(url1, stream=True).raw)
image2 = Image.open(requests.get(url1, stream=True).raw)
images = (image1, image2)
prompt = """
[INST] \nYou are giving two images of , determine if these are the same image or not and the reason. [/INST]
"""
It seems to ignore the second image. Any suggestion?

Llava Hugging Face org

Hey!

Yes, LLaVa-NeXT can accept multiple images as input as shown here. But since the model was not pre-trained with several images interleaved in one prompt, it might not perform well.

I recommend to fine-tune it for your use case, if you want decent quality in generating based on several images.

How should i use this model to generate captions for 3 millions images, like what resources to use(where to solve)? what will be the cost computation? what parallelizations to use?

Llava Hugging Face org

@LBS-LENKA you can either use TGI to serve it which comes with many optimizations under the hood: https://github.com/huggingface/text-generation-inference
I'm also building this project to optimize vision/multimodal models that you can find recipes inside depending on your hardware: https://github.com/merveenoyan/smol-vision

Hi, I was trying out the example given here: https://huggingface.co/docs/transformers/main/en/model_doc/llava_next#multi-image-inference

But I am getting an error while trying to apply chat template. Below are the code and the error:

Code:

import torch
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration, AutoProcessor, AutoTokenizer
from PIL import Image
import requests

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)
model.to(device)

url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image_stop = Image.open(requests.get(url, stream=True).raw)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_cats = Image.open(requests.get(url, stream=True).raw)

url = "https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/snowman.jpg"
image_snowman = Image.open(requests.get(url, stream=True).raw)

# Prepare a batch of two prompts, where the first one is a multi-turn conversation and the second is not
conversation_1 = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What is shown in this image?"},
            ],
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": "There is a red stop sign in the image."},
            ],
    },
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What about this image? How many cats do you see?"},
            ],
    },
]

conversation_2 = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What is shown in this image?"},
            ],
    },
]

prompt_1 = processor.apply_chat_template(conversation_1, add_generation_prompt=True)
prompt_2 = processor.apply_chat_template(conversation_2, add_generation_prompt=True)
prompts = [prompt_1, prompt_2]

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 46
     13 conversation_1 = [
     14     {
     15         "role": "user",
   (...)
     33     },
     34 ]
     36 conversation_2 = [
     37     {
     38         "role": "user",
   (...)
     43     },
     44 ]
---> 46 prompt_1 = processor.apply_chat_template(conversation_1, add_generation_prompt=True)
     47 prompt_2 = processor.apply_chat_template(conversation_2, add_generation_prompt=True)
     48 prompts = [prompt_1, prompt_2]

File /opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py:926, in ProcessorMixin.apply_chat_template(self, conversation, chat_template, tokenize, **kwargs)
    924         chat_template = self.default_chat_template
    925     else:
--> 926         raise ValueError(
    927             "No chat template is set for this processor. Please either set the `chat_template` attribute, "
    928             "or provide a chat template as an argument. See "
    929             "https://huggingface.co/docs/transformers/main/en/chat_templating for more information."
    930         )
    931 return self.tokenizer.apply_chat_template(
    932     conversation, chat_template=chat_template, tokenize=tokenize, **kwargs
    933 )

ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
Llava Hugging Face org

@biswadeep49 which version of transformers do you have? You need at least v4.43 for chat templates, that is when we added support for it

Hi, I also met the same issue with ValueError: No chat template is set for this processor. Please either set the chat_template attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
I use transformers 4.45.0. Any suggestion?
Thank you.

Llava Hugging Face org

@zcchen I just verified that the templates work in the latest version from main and the latest patch release. If you're on a jupyter notebook, you might need to restart the kernel. It happends sometimes that the package isn't updated until the kernel restarts

Also, I recommend to use v4.44.2 for now, as the version on main branch in under refactoring and might give some errors. I am working on it, but the PR is not merged yet

Sign up or log in to comment