|
--- |
|
license: apache-2.0 |
|
--- |
|
# ποΈ GLaMM-RegCap-VG |
|
|
|
--- |
|
## π Description |
|
GLaMM-RegCap-VG is the model specific to region-level captioning finetuned on Visual Genome. "RegCap-VG" indicates its specialization in region-level captioning with tuning on the Visual Genome dataset. |
|
|
|
|
|
## π» Download |
|
To get started with GLaMM-RegCap-VG, follow these steps: |
|
``` |
|
git lfs install |
|
git clone https://huggingface.co/MBZUAI/GLaMM-RegCap-VG |
|
``` |
|
|
|
## π Additional Resources |
|
- **Paper:** [ArXiv](https://arxiv.org/abs/2311.03356). |
|
- **GitHub Repository:** For training and updates: [GitHub - GLaMM](https://github.com/mbzuai-oryx/groundingLMM). |
|
- **Project Page:** For a detailed overview and insights into the project, visit our [Project Page - GLaMM](https://mbzuai-oryx.github.io/groundingLMM/). |
|
|
|
## π Citations and Acknowledgments |
|
|
|
```bibtex |
|
@article{hanoona2023GLaMM, |
|
title={GLaMM: Pixel Grounding Large Multimodal Model}, |
|
author={Rasheed, Hanoona and Maaz, Muhammad and Shaji, Sahal and Shaker, Abdelrahman and Khan, Salman and Cholakkal, Hisham and Anwer, Rao M. and Xing, Eric and Yang, Ming-Hsuan and Khan, Fahad S.}, |
|
journal={ArXiv 2311.03356}, |
|
year={2023} |
|
} |
|
|