metadata

tags:
  - image-to-text
  - image-captioning
  - endpoints-template
license: bsd-3-clause
library_name: generic

Fork of salesforce/BLIP for a `image-captioning` task on 🤗Inference endpoint.

This repository implements a custom task for image-captioning for 🤗 Inference Endpoints. The code for the customized pipeline is in the pipeline.py. To use deploy this model a an Inference Endpoint you have to select Custom as task to use the pipeline.py file. -> double check if it is selected

expected Request payload

{
  "image": "/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDAAMCAgICAgMC....", // base64 image as bytes
}

below is an example on how to run a request using Python and requests.

Run Request

prepare an image.

!wget https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg

2.run request

import json
from typing import List
import requests as r
import base64

ENDPOINT_URL = ""
HF_TOKEN = ""

def predict(path_to_image: str = None):
    with open(path_to_image, "rb") as i:
        image = i.read()
    payload = {
        "inputs": [image],
        "parameters": {
                   "do_sample": True,
                   "top_p":0.9,
                   "min_length":5,
                   "max_length":20
        }
    }
    response = r.post(
        ENDPOINT_URL, headers={"Authorization": f"Bearer {HF_TOKEN}"}, json=payload
    )
    return response.json()
prediction = predict(
    path_to_image="palace.jpg"
)

Example parameters depending on the decoding strategy:

Beam search

        "parameters": {
                   "num_beams":5,
                   "max_length":20
        }

Nucleus sampling

        "parameters": {
                   "num_beams":1,
                   "max_length":20,
                   "do_sample": True,
                   "top_k":50,
                   "top_p":0.95
        }

Contrastive search

        "parameters": {
                   "penalty_alpha":0.6,
                   "top_k":4
                   "max_length":512
        }

See generate() doc for additional detail

expected output

['buckingham palace with flower beds and red flowers']

Fork of salesforce/BLIP for a image-captioning task on 🤗Inference endpoint.

expected Request payload

Run Request

Fork of salesforce/BLIP for a `image-captioning` task on 🤗Inference endpoint.