--- license: cc-by-4.0 datasets: - RussRobin/SpatialQA language: - en tags: - Embodied AI - MLLM - VLM - Spatial Understanding - Phi-2 pipeline_tag: visual-question-answering --- SpatialBot is a VLM with spatial understanding and reasoning abilties, by precisely understanding depth maps and using them to do high-level tasks. In this HF repo, we provide ckpts of SpatialBot-3B with LoRA, which is based on Phi-2 and SigLIP. It can perform well on general VLM tasks and spatial understanding benchmarks like SpatialBench. You will also need to download [pretrained CKPT](https://huggingface.co/RussRobin/SpatialBot-3B-pretrain). ### Paper: https://arxiv.org/abs/2406.13642 ### GitHub repo: https://github.com/BAAI-DCAI/SpatialBot ### SpatialBench, the benchmark: https://huggingface.co/datasets/RussRobin/SpatialBench ### Merged SpatialBot-3B: https://huggingface.co/RussRobin/SpatialBot-3B