Merve Noyan's picture

Merve Noyan

merve

·

AI & ML interests

VLMs, vision & co

Recent Activity

posted an update about 11 hours ago

The authors of ColPali trained a retrieval model based on SmolVLM 🤠 https://huggingface.co/vidore/colsmolvlm-alpha TLDR; - ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks - ColSmolVLM is more memory efficient than ColQwen2 💗

updated a Space about 13 hours ago

HuggingFaceTB/SmolVLM

liked a model 1 day ago

HuggingFaceTB/SmolLM2-1.7B

View all activity

Articles

SmolVLM - small yet mighty Vision Language Model

Llama can now see and run on your device - welcome Llama 3.2

Preference Optimization for Vision Language Models

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models Explained

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

Deploy MusicGen in no time with Inference Endpoints

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Jupyter X Hugging Face

Using Machine Learning to Aid Survivors and Race through Time

Introducing Skops

Announcing the Hugging Face Fellowship Program

Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Showcase Your Projects in Spaces using Gradio

Organizations

merve's activity

upvoted 3 papers about 2 months ago

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 26

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 47

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

Paper • 2405.02296 • Published Mar 7 • 4

upvoted a paper 2 months ago

NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Paper • 2404.01300 • Published Apr 1 • 4

upvoted an article 2 months ago

Article

Document Similarity Search with ColPali

By

•

Sep 21

• 47

upvoted 3 papers 3 months ago

DriveLM: Driving with Graph Visual Question Answering

Paper • 2312.14150 • Published Dec 21, 2023 • 4

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17 • 51

upvoted a collection 4 months ago

InternVideo2

InternVideo2 • 15 items • Updated Oct 21 • 15

upvoted 5 papers 4 months ago

KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23 • 42

Meltemi: The first open Large Language Model for Greek

Paper • 2407.20743 • Published Jul 30 • 67

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Paper • 2407.21794 • Published Jul 31 • 5

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 108

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 75

upvoted a collection 4 months ago

SpaceVLMs

Features VLMs fine-tuned for enhanced spatial reasoning using a synthetic data pipeline similar to Spatial VLM. • 3 items • Updated Jul 26 • 1

upvoted 5 papers 4 months ago

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Paper • 2407.12594 • Published Jul 17 • 19

MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11 • 30

SEED-Story: Multimodal Long Story Generation with Large Language Model

Paper • 2407.08683 • Published Jul 11 • 22

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10 • 27

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10 • 40