Raw Model Input/output?

by Bardia323 - opened Sep 22, 2023

Sep 22, 2023

Hello Facebook,

My understanding of the donut model is that it receives image into a pretrained donut processor and passes a vectorized representation to the base donut model.
Given that the github repo for nougat does some preprocessing on pdf and html files for building and finetuning datasets, if I wish to finetune the model on specific images instead of pdfs and raw text per image, would I be able to do so using the same procedure as the base donut model?

Thank you!
Bardia

nielsr

Sep 22, 2023

•

edited Sep 22, 2023

Hi,

DonutProcessor is not really "pretrained", it's just an object that prepares the image for the model. It will basically convert a Pillow image (or NumPy array) to a PyTorch tensor called pixel_values, by resizing the image and normalizing the color channels. The reason we do processor = DonutProcessor.from_pretrained("facebook/nougat-base") is to make sure that the preprocessing settings are used for the model at "facebook/nougat-base".

Bardia323

Sep 22, 2023

Hey Niels!

Thanks for reaching out with a quick response and explanation. Given your explanation, I assume everything else remains standard to the donut model. I had follow-up questions like image size and token length but found the answers in the paper. Thanks for reaching out. I'll post any further issues I might have in the discussion as I start playing around with it.

Cheers!
Bardia

kartavyabagga

Nov 25, 2023

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment