Ross Wightman's picture

Ross Wightman

rwightman

·

AI & ML interests

Computer vision, transfer learning, semi/self supervised learning, robotics.

Recent Activity

New activity 5 days ago

timm/ViT-B-16-SigLIP-i18n-256:Are the languages that are supported documented anywhere?

updated a model 7 days ago

timm/mobilenetv4_conv_medium.e180_ad_r384_in12k

Reacted to jeffboudier's post with 🤗 10 days ago

New - add your bluesky account to your HF profile: https://huggingface.co/settings/profile Is the grass greener, the sky bluer? Will try and figure it out at https://bsky.app/profile/jeffboudier.bsky.social By the way, HF people starter pack https://bsky.app/starter-pack/huggingface.bsky.social/3laz5x7naiz22

View all activity

Articles

Trick or ResNet Treat

Mamba Out

Tiny Test Models

Searching for better (Full) ImageNet ViT Baselines

MobileNet Baselines

MobileNet-V4 (now in timm)

Organizations

rwightman's activity

upvoted an article 10 days ago

Article

🤗 Serve any model with Inference Endpoints + Custom Handlers

By

•

10 days ago

• 3

upvoted 2 collections 2 months ago

RDNet

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs [ECCV 2024] • 9 items • Updated Oct 16 • 2

timm tiny test models

A collection of very small (~300-500k parameter) models at 160x160 resolution, for testing purposes. Trained on ImageNet-1k. • 13 items • Updated Oct 2 • 3

upvoted 2 articles 4 months ago

Article

MobileNet Baselines

By

•

Jul 26

• 23

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25

• 18

upvoted a collection 4 months ago

🍃 MINT-1T

Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 54

upvoted 2 papers 5 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 68

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 57

upvoted a collection 5 months ago

Cambrian Data

3 items • Updated Jun 25 • 9

upvoted a paper 6 months ago

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55

upvoted 2 collections 6 months ago

MobileCLIP Models + DataCompDR Data

MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4 • 25

MobileNetV4 pretrained weights

Weights for MobileNet-V4 pretrained in timm • 17 items • Updated Sep 22 • 18

upvoted 2 papers 6 months ago

MobileNetV4 -- Universal Models for the Mobile Ecosystem

Paper • 2404.10518 • Published Apr 16 • 2

On the Efficiency of Convolutional Neural Networks

Paper • 2404.03617 • Published Apr 4 • 4

upvoted an article 6 months ago

Article

MobileNet-V4 (now in timm)

By

•

Jun 17

• 39

upvoted 2 articles 7 months ago

Article

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

By

•

May 16

• 17

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 213

upvoted 3 collections 7 months ago

PaliGemma Release

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 139

PaliGemma FT Models

108 items • Updated Jul 31 • 28

Searching for Better ViT Baselines

Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 25 items • Updated Aug 21 • 13