billpsomas
/

vits_supervised_simpool_ep300

Image Classification

vision transformer

computer vision

Model card Files Files and versions Community

billpsomas commited on Dec 1, 2023

Commit

b6a8aaf

•

1 Parent(s): 69e2d8d

Update README.md

Files changed (1) hide show

README.md +43 -0

README.md CHANGED Viewed

@@ -1,3 +1,46 @@
 ---
 license: cc-by-4.0
 ---

 ---
 license: cc-by-4.0
+datasets:
+- imagenet-1k
+metrics:
+- accuracy
+pipeline_tag: image-classification
+language:
+- en
+tags:
+- vision transformer
+- simpool
+- computer vision
+- deep learning
 ---
+# Supervised ViT-S/16 (small-sized Vision Transformer with patch size 16) model with SimPool
+ViT-S model with SimPool (gamma=1.25) trained on ImageNet-1k for 300 epochs.
+SimPool is a simple attention-based pooling method at the end of network, introduced on this ICCV 2023 [paper](https://arxiv.org/pdf/2309.06891.pdf) and released in this [repository](https://github.com/billpsomas/simpool/).
+Disclaimer: This model card is written by the author of SimPool, i.e. [Bill Psomas](http://users.ntua.gr/psomasbill/).
+## Motivation
+Convolutional networks and vision transformers have different forms of pairwise interactions, pooling across layers and pooling at the end of the network. Does the latter really need to be different?
+As a by-product of pooling, vision transformers provide spatial attention for free, but this is most often of low quality unless self-supervised, which is not well studied. Is supervision really the problem?
+## Method
+SimPool is a simple attention-based pooling mechanism as a replacement of the default one for both convolutional and transformer encoders. For transformers, we completely discard the [CLS] token.
+Interestingly, we find that, whether supervised or self-supervised, SimPool improves performance on pre-training and downstream tasks and provides attention maps delineating object boundaries in all cases.
+One could thus call SimPool universal.
+## BibTeX entry and citation info
+```
+@misc{psomas2023simpool,
+      title={Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?},
+      author={Bill Psomas and Ioannis Kakogeorgiou and Konstantinos Karantzalos and Yannis Avrithis},
+      year={2023},
+      eprint={2309.06891},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```