File size: 965 Bytes
ed5a071 211026e ed5a071 6dc0a2f 211026e 6dc0a2f 211026e 6dc0a2f 211026e 6dc0a2f 211026e 6dc0a2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
---
license: apache-2.0
datasets:
- imagenet-1k
- ade20k
metrics:
- accuracy
- mIoU
pipeline_tag: image-classification
---
# VisionLLaMA-Base-MAE
With the Masked Autoencoders' paradigm, VisionLLaMA-Base-MAE model is trained on ImageNet-1k without labels. It manifests substantial improvements over classification tasks (SFT, linear probing) on ImageNet-1K and the segmentation task on ADE20K.
| Model | ImageNet Acc (SFT) | ImageNet Acc (Linear Probe) | ADE20K Segmentation |
| -- | -- | --| --|
| VisionLLaMA-Base-MAE (ep800) |84.0 |69.7 |49.0 |
| VisionLLaMA-Base-MAE (ep1600) |84.3 | 71.7| 50.2 |
# How to Use
Please refer the [Github](https://github.com/Meituan-AutoML/VisionLLaMA) page for usage.
# Citation
```
@article{chu2024visionllama,
title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks},
author={Chu, Xiangxiang and Su, Jianlin and Zhang, Bo and Shen, Chunhua},
journal={arXiv preprint arXiv:2403.00522},
year={2024}
}
``` |