SpatialBot-3B-LoRA / README.md
RussRobin's picture
Update README.md
9b8fabf verified
|
raw
history blame contribute delete
No virus
992 Bytes
---
license: cc-by-4.0
datasets:
- RussRobin/SpatialQA
language:
- en
tags:
- Embodied AI
- MLLM
- VLM
- Spatial Understanding
- Phi-2
pipeline_tag: visual-question-answering
---
SpatialBot is a VLM with spatial understanding and reasoning abilties, by precisely understanding depth maps and using them to do high-level tasks.
In this HF repo, we provide ckpts of SpatialBot-3B with LoRA, which is based on Phi-2 and SigLIP. It can perform well on general VLM tasks and spatial understanding benchmarks like SpatialBench.
You will also need to download [pretrained CKPT](https://huggingface.co/RussRobin/SpatialBot-3B-pretrain).
### Paper:
https://arxiv.org/abs/2406.13642
### GitHub repo:
https://github.com/BAAI-DCAI/SpatialBot
<!-- ### SpatialQA, the training set:
https://huggingface.co/datasets/RussRobin/SpatialQA -->
### SpatialBench, the benchmark:
https://huggingface.co/datasets/RussRobin/SpatialBench
### Merged SpatialBot-3B:
https://huggingface.co/RussRobin/SpatialBot-3B