File size: 2,022 Bytes
5db666c
30c0927
c5d5384
 
 
 
 
 
 
 
 
 
 
 
 
5db666c
36cebc9
5db666c
36cebc9
5db666c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36cebc9
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: odc-by
datasets:
- neulab/MultiUI
language:
- en
base_model:
- Qwen/Qwen2-7B-Instruct
tags:
- GUI
- Agent
- Web
- OCR
- Doc
- VQA
---
#### Model for the paper: [Harnessing Webpage Uis For Text Rich Visual Understanding](https://arxiv.org/abs/2410.13824)

🌐 [Homepage](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv](https://arxiv.org/abs/2410.13824)

## Introduction
We introduce **MultiUI**, a dataset containing 7.3 million samples from 1 million websites, covering diverse multi- modal tasks and UI layouts. Models trained on **MultiUI** not only excel in web UI tasks—achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in action accuracy on a web agent dataset Mind2Web—but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation. 

<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video>

## Model Performance

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/h1L7J4rLlq6EOtbiXZjZW.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/NOVQ8WjgJoRm0bzN9zxFx.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/O6GhR1UXOSi7o3yjXvK4e.png)

## Contact
* Junpeng Liu: jpliu@link.cuhk.edu.hk
* Xiang Yue: xyue2@andrew.cmu.edu

## Citation
If you find this work helpful, please cite out paper:
````
@misc{liu2024harnessingwebpageuistextrich,
      title={Harnessing Webpage UIs for Text-Rich Visual Understanding}, 
      author={Junpeng Liu and Tianyue Ou and Yifan Song and Yuxiao Qu and Wai Lam and Chenyan Xiong and Wenhu Chen and Graham Neubig and Xiang Yue},
      year={2024},
      eprint={2410.13824},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.13824}, 
}
````