Varun Sakunia's picture

21 30

Varun Sakunia

Varun-08

·

AI & ML interests

Python, Machine Learning, Deep Learning, Computer Vision

Recent Activity

upvoted a paper 1 day ago

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

upvoted a paper 11 days ago

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

upvoted a paper 14 days ago

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

View all activity

Organizations

None yet

Varun-08's activity

upvoted a paper 1 day ago

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Paper • 2501.03218 • Published 4 days ago • 28

upvoted a paper 11 days ago

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Paper • 2412.19326 • Published 15 days ago • 18

upvoted a paper 14 days ago

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

Paper • 2412.01819 • Published Dec 2, 2024 • 34

liked a Space 16 days ago

Jupyter Agent

upvoted a collection 16 days ago

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 22 days ago • 120

liked a model 24 days ago

google/siglip-so400m-patch14-384

Zero-Shot Image Classification • Updated Sep 26, 2024 • 3.06M • 424

liked a model 27 days ago

matteogeniaccio/phi-4

Updated about 14 hours ago • 53.7k • 187

liked a model 28 days ago

meta-llama/Llama-3.3-70B-Instruct

Text Generation • Updated 20 days ago • 393k • • 1.56k

upvoted a collection about 1 month ago

PaliGemma 2 Release

Vision-Language Models available in multiple 3B, 10B and 28B variants. • 23 items • Updated 28 days ago • 125

upvoted a collection about 2 months ago

Models for dataset curation

9 items • Updated Dec 5, 2024 • 17

liked a Space about 2 months ago

Qwen2.5 Turbo 1M Demo

liked a model about 2 months ago

NexaAIDev/OmniVLM-968M

Updated 25 days ago • 1.68k • 493

upvoted a paper 2 months ago

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Paper • 2411.03823 • Published Nov 6, 2024 • 43

liked a dataset 2 months ago

wyu1/Leopard-Instruct

Viewer • Updated Nov 8, 2024 • 1.03M • 119k • 55

liked a model 3 months ago

microsoft/OmniParser

Image-Text-to-Text • Updated Dec 2, 2024 • 1.17k • 1.52k

upvoted a collection 3 months ago

Llama 3.2

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 554

liked a model 3 months ago

genmo/mochi-1-preview

Text-to-Video • Updated 23 days ago • 39.6k • 1.14k

upvoted a collection 3 months ago

DocLayout-YOLO

Dataset and model for DocLayout-YOLO • 9 items • Updated Oct 22, 2024 • 12

upvoted an article 3 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 215

liked a dataset 3 months ago

stanfordnlp/imdb

Viewer • Updated Jan 4, 2024 • 100k • 66.7k • 266