dkapt (Dimitrios Kapetanios)

upvoted a paper about 22 hours ago

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published 3 days ago • 20

upvoted an article 2 days ago

Article

ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models

By

•

23 days ago

• 14

upvoted a paper about 1 month ago

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8 • 107

upvoted an article about 1 month ago

Article

Document Similarity Search with ColPali

By

•

Sep 21

• 46

upvoted a collection about 1 month ago

Qwen2-VL

Collection

Vision-language model series based on Qwen2 • 15 items • Updated Sep 18 • 149

upvoted a paper about 1 month ago

Flamingo: a Visual Language Model for Few-Shot Learning

Paper • 2204.14198 • Published Apr 29, 2022 • 14

upvoted an article about 1 month ago

Article

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 26

upvoted a paper about 2 months ago

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27 • 41

upvoted a collection 4 months ago

Llama 3.1

Collection

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Sep 25 • 613

upvoted 4 articles 6 months ago

Article

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

By

•

May 16

• 17

Article

Vision Language Models Explained

Apr 11

• 209

Article

A Dive into Pretraining Strategies for Vision-Language Models

Feb 3, 2023

• 46

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 206

upvoted a paper 8 months ago

Evaluating Frontier Models for Dangerous Capabilities

Paper • 2403.13793 • Published Mar 20 • 7

Dimitrios Kapetanios

AI & ML interests

Organizations

dkapt's activity

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models

Aria: An Open Multimodal Native Mixture-of-Experts Model

Document Similarity Search with ColPali

Qwen2-VL

Flamingo: a Visual Language Model for Few-Shot Learning

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

ColPali: Efficient Document Retrieval with Vision Language Models

Llama 3.1

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

Vision Language Models Explained

A Dive into Pretraining Strategies for Vision-Language Models

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Evaluating Frontier Models for Dangerous Capabilities