Qinghong Lin's picture

Qinghong Lin

KevinQHLin

·

https://qinghonglin.github.io/

AI & ML interests

vision+language

Recent Activity

authored a paper about 2 hours ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

upvoted a collection about 9 hours ago

upvoted a collection about 9 hours ago

Research on GUI Models

View all activity

Organizations

KevinQHLin's activity

authored a paper about 2 hours ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published 1 day ago • 46

upvoted 2 collections about 9 hours ago

GUI Models

6 items • Updated 8 days ago • 2

Research on GUI Models

14 items • Updated 8 days ago • 1

Reacted to maxiw's post with 🤗🚀👍 about 9 hours ago

Post

1602

You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")

Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

2 replies

·

replied to maxiw's post about 9 hours ago

Hi @maxiw , would you want to consider integrate our ShowUI?
a 2B model from Qwen2-VL-2B, but with strong UI grounding and navigation :)

updated a model about 10 hours ago

showlab/ShowUI-2B

Updated about 10 hours ago • 243 • 17

New activity in showlab/ShowUI about 13 hours ago

Apply for community grant: Academic project (gpu and storage)

#1 opened about 22 hours ago by

New activity in showlab/ShowUI-2B about 14 hours ago

Adding `safetensors` variant of this model

#1 opened 8 days ago by

Adding `safetensors` variant of this model

#2 opened about 18 hours ago by

Adding `safetensors` variant of this model

#3 opened about 16 hours ago by

updated a dataset about 19 hours ago

showlab/ShowUI-desktop-8K

Viewer • Updated about 19 hours ago • 7.5k • 13 • 3

liked a Space about 21 hours ago

Running on Zero

ShowUI

liked a model about 21 hours ago

showlab/ShowUI-2B

Updated about 10 hours ago • 243 • 17

upvoted a paper about 22 hours ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published 1 day ago • 46

commented a paper about 22 hours ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published 1 day ago • 46 •

upvoted a paper 9 days ago

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Paper • 2411.10323 • Published 12 days ago • 27

updated 2 models 11 days ago

showlab/ShowUI-2B

Updated about 10 hours ago • 243 • 17

showlab/ShowUI-2B

Updated about 10 hours ago • 243 • 17