ucsahin (Umitcan Sahin)

upvoted a collection about 2 months ago

Turkish Vision-Language Datasets

Collection

Collection of Turkish vision-language datasets. • 17 items • Updated 13 days ago • 4

upvoted 5 papers 3 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5 • 60

upvoted a collection 3 months ago

Vision Language Leaderboards

Collection

This collection has all the vision language leaderboards. • 7 items • Updated Aug 24 • 9

upvoted an article 3 months ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31

• 59

upvoted an article 4 months ago

Article

The Rise of Agentic Data Generation

By

•

Jul 15

• 75

upvoted 2 papers 4 months ago

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9 • 41

upvoted a collection 4 months ago

🪐 SmolLM

Collection

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 192

upvoted 2 articles 4 months ago

Article

TGI Multi-LoRA: Deploy Once, Serve 30 Models

Jul 18

• 47

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18

• 66

upvoted 4 papers 4 months ago

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11 • 19

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3 • 44

Multi-Object Hallucination in Vision-Language Models

Paper • 2407.06192 • Published Jul 8 • 9

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27 • 41

upvoted an article 5 months ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Jun 24

• 177

upvoted a paper 5 months ago

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Paper • 2406.09406 • Published Jun 13 • 13

Umitcan Sahin

AI & ML interests

Organizations

ucsahin's activity

Turkish Vision-Language Datasets

LLaVA-OneVision: Easy Visual Task Transfer

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

VITA: Towards Open-Source Interactive Omni Multimodal LLM

SAM 2: Segment Anything in Images and Videos

Gemma 2: Improving Open Language Models at a Practical Size

Vision Language Leaderboards

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

The Rise of Agentic Data Generation

EVLM: An Efficient Vision-Language Model for Visual Understanding

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

🪐 SmolLM

TGI Multi-LoRA: Deploy Once, Serve 30 Models

Docmatix - a huge dataset for Document Visual Question Answering

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

AgentInstruct: Toward Generative Teaching with Agentic Flows

Multi-Object Hallucination in Vision-Language Models

ColPali: Efficient Document Retrieval with Vision Language Models

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities