--- license: apache-2.0 datasets: - imagenet-1k metrics: - accuracy pipeline_tag: image-classification --- # VisionLLaMA-Base-MAE With the Masked Autoencoders' paradigm, VisionLLaMA-Large-MAE model is trained on ImageNet-1K without labels. It retains improvements over classification tasks (SFT, linear probing) on ImageNet-1K. | Model | ImageNet Acc (SFT) | ImageNet Acc (Linear Probe) | | -- | -- | --| | VisionLLaMA-Large-MAE (ep800) |85.5 | 77.3 | # How to Use Please refer the [Github](https://github.com/Meituan-AutoML/VisionLLaMA) page for usage. # Citation ``` @article{chu2024visionllama, title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks}, author={Chu, Xiangxiang and Su, Jianlin and Zhang, Bo and Shen, Chunhua}, journal={arXiv preprint arXiv:2403.00522}, year={2024} } ```