mtgv
/

Image Classification
File size: 965 Bytes
ed5a071
 
211026e
 
 
 
 
 
 
ed5a071
6dc0a2f
211026e
 
 
 
 
 
 
 
 
 
6dc0a2f
211026e
6dc0a2f
211026e
6dc0a2f
211026e
6dc0a2f
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
license: apache-2.0
datasets:
- imagenet-1k
- ade20k
metrics:
- accuracy
- mIoU
pipeline_tag: image-classification
---

# VisionLLaMA-Base-MAE

With the Masked Autoencoders' paradigm, VisionLLaMA-Base-MAE model is trained on ImageNet-1k without labels. It manifests substantial improvements over classification tasks (SFT, linear probing) on ImageNet-1K and the segmentation task on ADE20K. 

| Model |  ImageNet Acc (SFT) |  ImageNet Acc (Linear Probe) | ADE20K Segmentation  |
| -- | -- | --| --|
| VisionLLaMA-Base-MAE (ep800) |84.0 |69.7 |49.0 |
| VisionLLaMA-Base-MAE (ep1600) |84.3 | 71.7| 50.2 |


# How to Use

Please refer the [Github](https://github.com/Meituan-AutoML/VisionLLaMA) page for usage.

# Citation

```
@article{chu2024visionllama,
  title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks},
  author={Chu, Xiangxiang and Su, Jianlin and Zhang, Bo and Shen, Chunhua},
  journal={arXiv preprint arXiv:2403.00522},
  year={2024}
}
```