ColPali
Safetensors
English
vidore

Is it possible to perform a search using an image as the input query?

#11
by Sergen1 - opened

Hello, I need help with conducting a search using an image as the query input. I’ve checked various sources but couldn’t find clear information. Is this possible?

Yes, just simply pass the image embedding as the query set. This is with the older version, before the API refactor. Creating the embeddings as usual:

page_embeddings(images, colpali_model, colpali_processor):
    dataloader = DataLoader(
        images,
        batch_size=4,
        shuffle=False,
        collate_fn=lambda x: process_images(colpali_processor, x),
    )
    ds = []
    for batch_doc in dataloader:
            with torch.no_grad():
                batch_doc = {k: v.to(colpali_model.device) for k, v in batch_doc.items()}
                embeddings_doc = colpali_model(**batch_doc)
            ds.extend(list(torch.unbind(embeddings_doc.to("cpu"))))
    return ds

Now these can be used as the query, against the rest of the document data set. Again, the API is now different, you need to adjust it to work with the refactored API:

 scores = retriever_evaluator.evaluate(query_set, data_set)

We use this for document similarity search, and it works very well:

https://huggingface.co/blog/fsommers/document-similarity-colpali

ILLUIN Vidore org

Thanks Frank , perfect answer !

But yeah definitely possible, only thing is the matrix might get large so you might want to adjust the batch size argument in the processor.get_scores()

FYI the updated API is just as it is in the quickstart:

processor.process_images(x)

processor.get_scores(qs, ds)

Cheers,
Manu

manu changed discussion status to closed

Sign up or log in to comment