Raw Model Input/output?
Hello Facebook,
My understanding of the donut model is that it receives image into a pretrained donut processor and passes a vectorized representation to the base donut model.
Given that the github repo for nougat does some preprocessing on pdf and html files for building and finetuning datasets, if I wish to finetune the model on specific images instead of pdfs and raw text per image, would I be able to do so using the same procedure as the base donut model?
Thank you!
Bardia
Hi,
DonutProcessor
is not really "pretrained", it's just an object that prepares the image for the model. It will basically convert a Pillow image (or NumPy array) to a PyTorch tensor called pixel_values
, by resizing the image and normalizing the color channels. The reason we do processor = DonutProcessor.from_pretrained("facebook/nougat-base")
is to make sure that the preprocessing settings are used for the model at "facebook/nougat-base".
Hey Niels!
Thanks for reaching out with a quick response and explanation. Given your explanation, I assume everything else remains standard to the donut model. I had follow-up questions like image size and token length but found the answers in the paper. Thanks for reaching out. I'll post any further issues I might have in the discussion as I start playing around with it.
Cheers!
Bardia