docci-896 3b, 10b, 28b?

#4
by lucyknada - opened

hi there, just wondering was a docci train not done for 896 resolution for all the sizes and is 28b missing? or will that be uploaded too? thanks

Google org

Hi @lucyknada ! So far, Google only fine-tuned on DOCCI the 3B and 10B variants, at the 448 resolution. I think that 896 would be most helpful for tasks that benefit from finer details, such as OCR or text extraction. Just curious, is there a particular task you have in mind?

I agree, mostly finer details in general where the extra res could help; also non docci tunes seemed to only output a few words each so any task is affected by this pretty much; desktop screenshot descriptions and guidance, OCR, regular detailed image descriptions, large image details (e.g. traffic or manufacturing monitoring) and much more, possibly I wasn't using the non docci ones right? but even the huggingface demo was suffering from the same issue(s).

Google org

Yes, the demo is running on a quick VQAv2 fine-tune, whose answers are short. But as the DOCCI checkpoints demonstrate, it is indeed possible to fine-tune to much longer and detailed responses.

Thanks! not just the demo however, I got very short responses from the regular non-docci checkpoints too, hence why larger resolution with docci would be nice

Sign up or log in to comment