GeoChat-7B

GeoChat is the first grounded Large Vision Language Model, specifically tailored to Remote Sensing(RS) scenarios. Unlike general-domain models, GeoChat excels in handling high-resolution RS imagery, employing region-level reasoning for comprehensive scene interpretation. Leveraging a newly created RS multimodal dataset, GeoChat is fine-tuned using the LLaVA-1.5 architecture. This results in robust zero-shot performance across various RS tasks, including image and region captioning, visual question answering, scene classification, visually grounded conversations, and referring object detection.

  • Developed by MBZUAI

Model Sources

BibTeX:

@misc{kuckreja2023geochat,
      title={GeoChat: Grounded Large Vision-Language Model for Remote Sensing}, 
      author={Kartik Kuckreja and Muhammad Sohail Danish and Muzammal Naseer and Abhijit Das and Salman Khan and Fahad Shahbaz Khan},
      year={2023},
      eprint={2311.15826},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}  

Authors

Kartik Kuckreja, Muhammad Sohail

Contact

kartik.kuckreja@mbzuai.ac.ae

Downloads last month
606
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using MBZUAI/geochat-7B 5

Collection including MBZUAI/geochat-7B