SohanAnisetty
/

ofa-vqa-base

Inference Endpoints

Model card Files Files and versions Community

ofa-vqa-base / README.md

Sohan Anisetty

readme changes

a830132 over 1 year ago

|

history blame contribute delete

No virus

1.59 kB

	---
	license: apache-2.0
	---

	# OFA-Base

	## Introduction
	This is the base version of OFA pretrained model finetuned on CLEVR and a custom block stack dataset.

	The directory includes 4 files, namely `config.json` which consists of model configuration, `vocab.json` and `merge.txt` for our OFA tokenizer, and lastly `pytorch_model.bin` which consists of model weights.


	## How to use
	Download the models as shown below.
	```bash
	git clone https://github.com/sohananisetty/OFA_VQA.git
	git clone https://huggingface.co/SohanAnisetty/ofa-vqa-base
	```

	After, refer the path to ofa-vqa-base to `ckpt_dir`, and prepare an image for the testing example below.

	```python
	from PIL import Image
	from torchvision import transforms
	from transformers import OFATokenizer, OFAModelForVQA

	mean, std = [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]
	resolution = 480
	patch_resize_transform = transforms.Compose([
	lambda image: image.convert("RGB"),
	transforms.Resize((resolution, resolution), interpolation=Image.BICUBIC),
	transforms.ToTensor(),
	transforms.Normalize(mean=mean, std=std)
	])


	tokenizer = OFATokenizer.from_pretrained(ckpt_dir)

	txt = " what does the image describe?"
	inputs = tokenizer([txt], return_tensors="pt").input_ids
	inputs = inputs.cuda()
	img = Image.open(path_to_image)
	patch_img = patch_resize_transform(img).unsqueeze(0).cuda()


	model = OFAModel.from_pretrained(ckpt_dir, use_cache=False).cuda()
	gen = model.generate(inputs, patch_images=patch_img, num_beams=5, no_repeat_ngram_size=3)

	print(tokenizer.batch_decode(gen skip_special_tokens=True))
	```