Using multiple prompts during inference

#28
by ksooklall - opened

I understand we can use many prompts like prompt = "<OD>" and prompt = "<CAPTION>" but is there a way to get both prompts? I don't want to run inference twice. Something like

prompt = "<OD_CAPTION>"
inputs = processor(text=prompt, images=image, return_tensors="pt").to(device, torch_dtype)
generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"])
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

and the result would be:

result: {'<OD>': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]}, '<CAPTION>': A green car parked in front of a yellow building."}

Sign up or log in to comment