vidore/colpali · Is it possible to perform a search using an image as the input query?

9 days ago

Hello, I need help with conducting a search using an image as the query input. I’ve checked various sources but couldn’t find clear information. Is this possible?

fsommers

9 days ago

Yes, just simply pass the image embedding as the query set. This is with the older version, before the API refactor. Creating the embeddings as usual:

page_embeddings(images, colpali_model, colpali_processor):
    dataloader = DataLoader(
        images,
        batch_size=4,
        shuffle=False,
        collate_fn=lambda x: process_images(colpali_processor, x),
    )
    ds = []
    for batch_doc in dataloader:
            with torch.no_grad():
                batch_doc = {k: v.to(colpali_model.device) for k, v in batch_doc.items()}
                embeddings_doc = colpali_model(**batch_doc)
            ds.extend(list(torch.unbind(embeddings_doc.to("cpu"))))
    return ds

Now these can be used as the query, against the rest of the document data set. Again, the API is now different, you need to adjust it to work with the refactored API:

 scores = retriever_evaluator.evaluate(query_set, data_set)

We use this for document similarity search, and it works very well:

https://huggingface.co/blog/fsommers/document-similarity-colpali

manu

ILLUIN Vidore org 9 days ago

Thanks Frank , perfect answer !

But yeah definitely possible, only thing is the matrix might get large so you might want to adjust the batch size argument in the processor.get_scores()

FYI the updated API is just as it is in the quickstart:

processor.process_images(x)

processor.get_scores(qs, ds)

Cheers,
Manu

manu changed discussion status to closed 9 days ago