Qinghong Lin

KevinQHLin

AI & ML interests

vision+language

Recent Activity

upvoted a collection about 9 hours ago
GUI Models
upvoted a collection about 9 hours ago
Research on GUI Models
View all activity

Organizations

KevinQHLin's activity

Reacted to maxiw's post with 🤗🚀👍 about 9 hours ago
view post
Post
1602
You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")


Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B
  • 2 replies
·
replied to maxiw's post about 9 hours ago
view reply

Hi @maxiw , would you want to consider integrate our ShowUI?
a 2B model from Qwen2-VL-2B, but with strong UI grounding and navigation :)

New activity in showlab/ShowUI about 13 hours ago
New activity in showlab/ShowUI-2B about 14 hours ago
liked a Space about 21 hours ago