πŸ‘οΈ GLaMM-RegCap-VG


πŸ“ Description

GLaMM-RegCap-VG is the model specific to region-level captioning finetuned on Visual Genome. "RegCap-VG" indicates its specialization in region-level captioning with tuning on the Visual Genome dataset.

πŸ’» Download

To get started with GLaMM-RegCap-VG, follow these steps:

git lfs install
git clone https://huggingface.co/MBZUAI/GLaMM-RegCap-VG

πŸ“š Additional Resources

πŸ“œ Citations and Acknowledgments

  @article{hanoona2023GLaMM,
          title={GLaMM: Pixel Grounding Large Multimodal Model},
          author={Rasheed, Hanoona and Maaz, Muhammad and Shaji, Sahal and Shaker, Abdelrahman and Khan, Salman and Cholakkal, Hisham and Anwer, Rao M. and Xing, Eric and Yang, Ming-Hsuan and Khan, Fahad S.},
          journal={ArXiv 2311.03356},
          year={2023}
  }
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including MBZUAI/GLaMM-RegCap-VG