Merve Noyan

merve

AI & ML interests

VLMs, vision & co

Recent Activity

Articles

Organizations

Hugging Face's profile picture Google's profile picture Deprem Yapay Zeka's profile picture Notebooks-explorers's profile picture SODA's profile picture Deprem Private's profile picture PyTorch Image Models's profile picture Turkish NLP Dataset Creators's profile picture Templates's profile picture Demo Crafters ๐Ÿค— 's profile picture Keras's profile picture tensorflow's profile picture Mukayese's profile picture HugGAN Community's profile picture EPFL VILAB's profile picture Hugging Face Fellows's profile picture Huggingface.js's profile picture scikit-learn's profile picture JAX โ™ฅ๏ธ Diffusers ๐Ÿงจ's profile picture HuggingFaceM4's profile picture 2023 Jan Offsite hackathon's profile picture HF Canonical Model Maintainers's profile picture scikit-learn's profile picture Huggingface Projects's profile picture fastai X Hugging Face Group 2022's profile picture boun-tabi-LMG's profile picture skops-tests's profile picture Kornia AI's profile picture Hugging Face H4's profile picture Keras Dreambooth Event's profile picture Turkish T5 - BERT - GPT-2's profile picture Blog-explorers's profile picture Hugging Face for Computer Vision's profile picture Hacktoberfest 2023's profile picture Hugging Face TB Research's profile picture adept-hf-collab's profile picture ZeroGPU Explorers's profile picture kotol's profile picture Magic Leap Community's profile picture Llava Hugging Face's profile picture MLX Community's profile picture Social Post Explorers's profile picture Top Contributors: Profile Followers's profile picture Dev Mode Explorers's profile picture Paris AI Running Club's profile picture yorg's profile picture CVPR2024's profile picture Les papiers de Merve's profile picture nltpt's profile picture s0409's profile picture Hugging Face FineVideo's profile picture mv's profile picture Cookbook Authors's profile picture open/ acc's profile picture Agents's profile picture

merve's activity

posted an update 3 days ago
view post
Post
1891
The authors of ColPali trained a retrieval model based on SmolVLM ๐Ÿค  vidore/colsmolvlm-alpha
TLDR;

- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks

- ColSmolVLM is more memory efficient than ColQwen2 ๐Ÿ’—
posted an update 4 days ago
view post
Post
3579
Small yet mighty! ๐Ÿ’ซ

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient ๐Ÿค 

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO ๐Ÿ’
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO ๐Ÿ’—
posted an update 8 days ago
view post
Post
2496
What a week! A recap for everything you missed โ„๏ธ
merve/nov-22-releases-673fbbcfc1c97c4f411def07
Multimodal โœจ
> Mistral AI
released Pixtral 124B, a gigantic open vision language model
> Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU
> OpenGVLab released MMPR: a new multimodal reasoning dataset
> Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings
> Apple released new SotA vision encoders AIMv2

LLMs ๐Ÿฆ™
> AllenAI dropped a huge release of models, datasets and scripts for Tรผlu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR
> Jina has released embeddings-v3: new multilingual embeddings with longer context
> Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning
> Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs

Image Generation ๐Ÿ–ผ๏ธ
> Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations

Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them ๐Ÿ“š
$ pip install observers
  • 3 replies
ยท
posted an update 8 days ago
view post
Post
1446
Apple released AIMv2 ๐Ÿ a family of state-of-the-art open-set vision encoders
apple/aimv2-6720fe1558d94c7805f7688c
> like CLIP, but add a decoder and train on autoregression ๐Ÿคฏ
> 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448
> Load and use with ๐Ÿค— transformers
posted an update 9 days ago
view post
Post
2892
your hugging face profile now has your recent activities ๐Ÿค—
posted an update 12 days ago
Reacted to sayakpaul's post with โค๏ธ 12 days ago
view post
Post
2409
It's been a while we shipped native quantization support in diffusers ๐Ÿงจ

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
  • 1 reply
ยท
Reacted to BlinkDL's post with ๐Ÿ”ฅ 12 days ago
view post
Post
2994
RWKV-6-world-v3 (+3.1T tokens) is our best multilingual 7B model as of now: BlinkDL/rwkv-6-world

It's 100% RNN and attention-free. MMLU 54.2% (previous world-v2.1 = 47.9%. note: without eval-boosting tricks such as annealing).

RWKV-7-world-v4 soon :)
Reacted to davidberenstein1957's post with ๐Ÿ”ฅ๐Ÿ‘€ 12 days ago
view post
Post
1898
For anyone who struggles with NER or information extraction with LLM.

We showed an efficient workflow for token classification including zero-shot suggestions and model fine-tuning with Argilla, GliNER, the NuMind NuExtract LLM and SpanMarker. @argilla

Video: https://youtu.be/JvLpaYgNd84?feature=shared
Notebooks and slides included to try it yourself ๐Ÿ™‚
Reacted to erikkaum's post with ๐Ÿ‘€๐Ÿ”ฅ 12 days ago
view post
Post
1675
A while ago I started experimenting with compiling the Python interpreter to WASM.

To build a secure, fast, and lightweight sandbox for code execution โ€” ideal for running LLM-generated Python code.

- Send code simply as a POST request
- 1-2ms startup times

Hack away:
https://github.com/ErikKaum/runner
Reacted to AdinaY's post with ๐Ÿ‘€ 12 days ago
Reacted to prithivMLmods's post with ๐Ÿค— 12 days ago
view post
Post
3909
Minimalistic Adapters ๐ŸŽƒ

๐Ÿš€Demo Here:
prithivMLmods/FLUX-LoRA-DLC

๐Ÿš€Model:
{ Quote Tuner } : prithivMLmods/Flux.1-Dev-Quote-LoRA
{ Stamp Art } : prithivMLmods/Flux.1-Dev-Stamp-Art-LoRA
{ Hand Sticky } : prithivMLmods/Flux.1-Dev-Hand-Sticky-LoRA
{ Poster HQ } : prithivMLmods/Flux.1-Dev-Poster-HQ-LoRA
{ Ctoon Min } : prithivMLmods/Flux.1-Dev-Ctoon-LoRA

๐Ÿš€Collection:
{ Flux LoRA Collection} : prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
{ LoRA Space Collection } : prithivMLmods/lora-space-collections-6714b72e0d49e1c97fbd6a32

๐Ÿš€For More Visit
https://huggingface.co/strangerzonehf
.
.
.
๐Ÿค—@prithivMLmods
  • 3 replies
ยท
Reacted to hexgrad's post with ๐Ÿ”ฅ 13 days ago
posted an update 14 days ago
view post
Post
4816
OmniVision-968M: a new local VLM for edge devices, fast & small but performant
๐Ÿ’จ a new vision language model with 9x less image tokens, super efficient
๐Ÿ“– aligned with DPO for reducing hallucinations
โšก๏ธ Apache 2.0 license ๐Ÿ”ฅ

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model NexaAIDev/omnivision-968M
  • 4 replies
ยท
Reacted to AdinaY's post with ๐Ÿ‘€๐Ÿ”ฅ 15 days ago
view post
Post
2519
Letโ€™s dive into the exciting releases from the Chinese community last week ๐Ÿ”ฅ๐Ÿš€
More details ๐Ÿ‘‰ https://huggingface.co/zh-ai-community

Code model:
โœจQwen 2.5 coder by Alibaba Qwen
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f
โœจOpenCoder by InflyAI - Fully open code model๐Ÿ™Œ
infly/opencoder-672cec44bbb86c39910fb55e

Image model:
โœจHunyuan3D-1.0 by Tencent
tencent/Hunyuan3D-1

MLLM:
โœจJanusFlow by DeepSeek
deepseek-ai/JanusFlow-1.3B
deepseek-ai/JanusFlow-1.3B
โœจMono-InternVL-2B by OpenGVlab
OpenGVLab/Mono-InternVL-2B

Video model:
โœจCogVideoX 1.5 by ChatGLM
THUDM/CogVideoX1.5-5B-SAT

Audio model:
โœจFish Agent by FishAudio
fishaudio/fish-agent-v0.1-3b

Dataset:
โœจOPI dataset by BAAIBeijing
BAAI/OPI
posted an update 15 days ago
view post
Post
1953
Amazing past days at open ML, it's raining coding models, let's have a recap ๐ŸŒง๏ธ Find all models and datasets here merve/nov-15-releases-67372d0ebdc354756a52ecd0

Models
๐Ÿ’ป Coding: Qwen team released two Qwen2.5-Coder checkpoints of 32B and 7B. Infly released OpenCoder: 1.5B and 8B coding models with instruction SFT'd versions and their datasets! ๐Ÿ’—

๐Ÿ–ผ๏ธ Image/Video Gen: Alibaba vision lab released In-context LoRA -- 10 LoRA models on different themes based on Flux. Also Mochi the sota video generation model with A2.0 license now comes natively supported in diffusers ๐Ÿ‘

๐Ÿ–ผ๏ธ VLMs/Multimodal: NexaAIDev released Omnivision 968M a new vision language model aligned with DPO for reducing hallucinations, also comes with GGUF ckpts ๐Ÿ‘ Microsoft released LLM2CLIP, a new CLIP-like model with longer context window allowing complex text inputs and better search

๐ŸŽฎ AGI?: Etched released Oasis 500M, a diffusion based open world model that takes keyboard input and outputs gameplay ๐Ÿคฏ

Datasets
Common Corpus: A text dataset with 2T tokens with permissive license for EN/FR on various sources: code, science, finance, culture ๐Ÿ“–
Reacted to maxiw's post with ๐Ÿ‘ 16 days ago
view post
Post
4581
I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
โค๏ธ: 7048 times
๐Ÿ”ฅ: 5921 times
๐Ÿ‘: 4856 times
๐Ÿš€: 2549 times
๐Ÿค—: 2065 times
ยท