1275 154 209

Merve Noyan

merve

AI & ML interests

VLMs, vision & co

Recent Activity

New activity 2 days ago

HuggingFaceTB/SmolVLM:Actual text streaming

posted an update 3 days ago

The authors of ColPali trained a retrieval model based on SmolVLM 🤠 https://huggingface.co/vidore/colsmolvlm-alpha TLDR; - ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks - ColSmolVLM is more memory efficient than ColQwen2 💗

updated a Space 3 days ago

HuggingFaceTB/SmolVLM

View all activity

Articles

Organizations

merve's activity

New activity in HuggingFaceTB/SmolVLM 2 days ago

Actual text streaming

#4 opened 3 days ago by

cbensimon

posted an update 3 days ago

Post

1891

The authors of ColPali trained a retrieval model based on SmolVLM 🤠 vidore/colsmolvlm-alpha
TLDR;

- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks

- ColSmolVLM is more memory efficient than ColQwen2 💗

updated a Space 3 days ago

Running on Zero

📊

SmolVLM

liked 2 models 4 days ago

HuggingFaceTB/SmolLM2-1.7B

Text Generation • Updated 6 days ago • 16.1k • 80

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated 3 days ago • 14.7k • 173

liked a Space 4 days ago

Running on Zero

📊

SmolVLM

posted an update 4 days ago

Post

3579

Small yet mighty! 💫

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO 💝
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO 💗

updated a model 4 days ago

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated 3 days ago • 14.7k • 173

New activity in HuggingFaceTB/SmolVLM-Instruct 4 days ago

Revert chat template

#4 opened 4 days ago by

merve

New activity in HuggingFaceTB/SmolVLM 5 days ago

Upload rococo.jpg

#2 opened 5 days ago by

merve

Upload rococo.jpg

#1 opened 5 days ago by

merve

updated a model 5 days ago

HuggingFaceTB/SmolVLM-Base

Image-Text-to-Text • Updated 2 days ago • 1.02k • 12

New activity in HuggingFaceTB/SmolVLM-Base 5 days ago

Add eos token

#2 opened 5 days ago by

merve

New activity in HuggingFaceTB/SmolVLM-Base 6 days ago

Added chat_template

#1 opened 6 days ago by

merve

updated a model 6 days ago

HuggingFaceTB/SmolVLM-Base

Image-Text-to-Text • Updated 2 days ago • 1.02k • 12

New activity in HuggingFaceTB/SmolVLM-Base 6 days ago

Added chat_template

#1 opened 6 days ago by

merve

updated a model 8 days ago

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated 3 days ago • 14.7k • 173

New activity in HuggingFaceTB/SmolVLM-Instruct 8 days ago

Misc improvements

#1 opened 8 days ago by

merve

posted an update 8 days ago

Post

2496

What a week! A recap for everything you missed ❄️
merve/nov-22-releases-673fbbcfc1c97c4f411def07
Multimodal ✨
> Mistral AI
released Pixtral 124B, a gigantic open vision language model
> Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU
> OpenGVLab released MMPR: a new multimodal reasoning dataset
> Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings
> Apple released new SotA vision encoders AIMv2

LLMs 🦙
> AllenAI dropped a huge release of models, datasets and scripts for Tülu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR
> Jina has released embeddings-v3: new multilingual embeddings with longer context
> Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning
> Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs

Image Generation 🖼️
> Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations

Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them 📚
$ pip install observers

3 replies

posted an update 8 days ago

Post

1446

Apple released AIMv2 🍏 a family of state-of-the-art open-set vision encoders
apple/aimv2-6720fe1558d94c7805f7688c
> like CLIP, but add a decoder and train on autoregression 🤯
> 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448
> Load and use with 🤗 transformers

Merve Noyan

AI & ML interests

Recent Activity

Articles

SmolVLM - small yet mighty Vision Language Model

Llama can now see and run on your device - welcome Llama 3.2

Preference Optimization for Vision Language Models

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models Explained

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

Deploy MusicGen in no time with Inference Endpoints

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Jupyter X Hugging Face

Using Machine Learning to Aid Survivors and Race through Time

Introducing Skops

Announcing the Hugging Face Fellowship Program

Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Showcase Your Projects in Spaces using Gradio

Organizations

merve's activity

Actual text streaming

SmolVLM

HuggingFaceTB/SmolLM2-1.7B

HuggingFaceTB/SmolVLM-Instruct

SmolVLM

HuggingFaceTB/SmolVLM-Instruct

Revert chat template

Upload rococo.jpg

Upload rococo.jpg

HuggingFaceTB/SmolVLM-Base

Add eos token

Added chat_template

HuggingFaceTB/SmolVLM-Base

Added chat_template

HuggingFaceTB/SmolVLM-Instruct

Misc improvements