4 25 72

Umitcan Sahin PRO

ucsahin

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Recent Activity

Reacted to merve's post with 🔥 3 days ago

The authors of ColPali trained a retrieval model based on SmolVLM 🤠 https://huggingface.co/vidore/colsmolvlm-alpha TLDR; - ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks - ColSmolVLM is more memory efficient than ColQwen2 💗

Reacted to merve's post with 👀 3 days ago

Reacted to merve's post with 👍 3 days ago

View all activity

Organizations

None yet

Posts 2

Post

3661

🚀 Introducing TraVisionLM: Turkish Visual Language Model - The First of Its Kind! 🇹🇷🖼️

I'm thrilled to share TraVisionLM on Hugging Face! With 875M parameters, this lightweight, efficient model handles Turkish instructions for image inputs. Fully compatible with the Transformers library, it’s easy to load, fine-tune, and use—no external libraries needed!

Developed solo, TraVisionLM is a strong foundation for low-resource language research. While still improving, it's a key step for Turkish-language AI. Your feedback is welcome as I refine the model.

🎉 Explore it now:

- Model: ucsahin/TraVisionLM-base
- Demo: https://huggingface.co/spaces/ucsahin/TraVisionLM-Turkish_Visual_Language_Model
- Object Detection Finetune: ucsahin/TraVisionLM-Object-Detection-ft

Let’s push Turkish visual language processing forward!

---

🚀 TraVisionLM: Türünün İlk Örneği Türkçe Görsel Dil Modelini Sunuyorum! 🇹🇷🖼️

TraVisionLM modelini Hugging Face'te yayınladım! 875M parametre ile bu hafif ve verimli model, görüntüye dayalı Türkçe talimatları işlemek için tasarlandı. Transformers kütüphanesiyle tamamen uyumlu, yüklemesi, eğitmesi ve kullanması çok kolay—dış kütüphane gerekmez!

Tek başıma geliştirdiğim TraVisionLM, düşük kaynaklı dillerde araştırmalar için sağlam bir temel sunuyor. Geliştirmeye devam ederken geri bildirimlerinizi bekliyorum.

🎉 Hemen keşfedin:

- Model: ucsahin/TraVisionLM-base
- Demo: https://huggingface.co/spaces/ucsahin/TraVisionLM-Turkish_Visual_Language_Model
- Obje Tespiti İnce Ayarı: ucsahin/TraVisionLM-Object-Detection-ft

Türkçe görsel dil işleme sınırlarını birlikte zorlayalım!

Post

3640

Florence-2 has a great capability of detecting various objects in a zero-shot setting with the task prompt "<OD>". However, if you want to detect specific objects that the base model is not able to in its current form, you can easily finetune it for this particular task. Below I show how to finetune the model to detect tables in a given image, but a similar process can be applied to detect any objects. Thanks to @andito , @merve , and @SkalskiP for sharing the fix for finetuning the Florence-2 model. Please also check their great blog post at https://huggingface.co/blog/finetune-florence2.

Colab notebook: https://colab.research.google.com/drive/1Y8GVjwzBIgfmfD3ZypDX5H1JA_VG0YDL?usp=sharing
Finetuned model: ucsahin/Florence-2-large-TableDetection