Emanuele Vivoli's picture

Emanuele Vivoli

emanuelevivoli

·

https://emanuelevivoli.github.io

AI & ML interests

Vision-Language models, VQA, DocumentAI

Organizations

emanuelevivoli's activity

commented a paper 6 days ago

M2M-Gen: A Multimodal Framework for Automated Background Music Generation in Japanese Manga Using Large Language Models

Paper • 2410.09928 • Published 27 days ago •

commented a paper about 2 months ago

One missing piece in Vision and Language: A Survey on Comics Understanding

Paper • 2409.09502 • Published Sep 14 • 23 •

New activity in google/paligemma-3b-mix-448 5 months ago

Torch detection bbox differs from JAX models?

#6 opened 5 months ago by

commented a paper 5 months ago

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Paper • 2406.08418 • Published Jun 12 • 28 •

commented a paper 6 months ago

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25 • 35 •

New activity in haotiz/glip-zeroshot-demo about 1 year ago

Apply for community grant: Academic project

#2 opened almost 2 years ago by