Image-Text-to-Text
Safetensors
llava_llama
BoyuNLP commited on
Commit
132ea2c
1 Parent(s): 1380629

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -28,4 +28,25 @@ UGround is a storng GUI visual grounding model trained with a simple recipe. Che
28
  - [ ] Data Construction Scripts
29
  - [ ] Guidance of Open-source Data
30
  - [ ] Full Data
31
- - [x] Online Demo (HF Spaces)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  - [ ] Data Construction Scripts
29
  - [ ] Guidance of Open-source Data
30
  - [ ] Full Data
31
+ - [x] Online Demo (HF Spaces)
32
+
33
+ ## Citation Information
34
+
35
+ If you find this work useful, please consider citing our paper:
36
+
37
+ ```
38
+ @article{gou2024uground,
39
+ title={Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents},
40
+ author={Boyu Gou and Ruohan Wang and Boyuan Zheng and Yanan Xie and Cheng Chang and Yiheng Shu and Huan Sun and Yu Su},
41
+ journal={arXiv preprint arXiv:2410.05243},
42
+ year={2024},
43
+ url={https://arxiv.org/abs/2410.05243},
44
+ }
45
+
46
+ @article{zheng2023seeact,
47
+ title={GPT-4V(ision) is a Generalist Web Agent, if Grounded},
48
+ author={Boyuan Zheng and Boyu Gou and Jihyung Kil and Huan Sun and Yu Su},
49
+ journal={arXiv preprint arXiv:2401.01614},
50
+ year={2024},
51
+ }
52
+ ```