--- license: apache-2.0 --- # OFA-Base ## Introduction This is the **base** version of OFA pretrained model finetuned on CLEVR and a custom block stack dataset. The directory includes 4 files, namely `config.json` which consists of model configuration, `vocab.json` and `merge.txt` for our OFA tokenizer, and lastly `pytorch_model.bin` which consists of model weights. ## How to use Download the models as shown below. ```bash git clone https://github.com/sohananisetty/OFA_VQA.git git clone https://huggingface.co/SohanAnisetty/ofa-vqa-base ``` After, refer the path to ofa-vqa-base to `ckpt_dir`, and prepare an image for the testing example below. ```python from PIL import Image from torchvision import transforms from transformers import OFATokenizer, OFAModelForVQA mean, std = [0.5, 0.5, 0.5], [0.5, 0.5, 0.5] resolution = 480 patch_resize_transform = transforms.Compose([ lambda image: image.convert("RGB"), transforms.Resize((resolution, resolution), interpolation=Image.BICUBIC), transforms.ToTensor(), transforms.Normalize(mean=mean, std=std) ]) tokenizer = OFATokenizer.from_pretrained(ckpt_dir) txt = " what does the image describe?" inputs = tokenizer([txt], return_tensors="pt").input_ids inputs = inputs.cuda() img = Image.open(path_to_image) patch_img = patch_resize_transform(img).unsqueeze(0).cuda() model = OFAModel.from_pretrained(ckpt_dir, use_cache=False).cuda() gen = model.generate(inputs, patch_images=patch_img, num_beams=5, no_repeat_ngram_size=3) print(tokenizer.batch_decode(gen skip_special_tokens=True)) ```