33 18 71

Leo Tronchon PRO

Leyo

AI & ML interests

Multimodal, Self-Supervised Learning

Recent Activity

updated a dataset 3 days ago

Leyo/moss_test_r8

updated a dataset 3 days ago

Leyo/moss_test_r8

View all activity

Articles

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 168

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Mar 15

• 6

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 27

Putting ethical principles at the core of research lifecycle

May 19, 2022

Organizations

Leyo's activity

upvoted an article 4 months ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18

• 68

upvoted an article 6 months ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 168

upvoted an article 7 months ago

Article

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

•

May 16

• 17

upvoted a paper 7 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 100

upvoted a collection 8 months ago

Idefics2 🐶

Collection

Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 89

upvoted 2 papers 9 months ago

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14 • 54

PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter

Paper • 2402.10896 • Published Feb 16 • 15

upvoted a paper 11 months ago

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper • 2312.14238 • Published Dec 21, 2023 • 14

upvoted 6 papers about 1 year ago

upvoted 4 papers over 1 year ago

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Paper • 2308.01907 • Published Aug 3, 2023 • 11

Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170

Generative Pretraining in Multimodality

Paper • 2307.05222 • Published Jul 11, 2023 • 21

OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

Paper • 2306.16527 • Published Jun 21, 2023 • 47