8 5 61

Maxi PRO

maxiw

AI & ML interests

Computer Agents | VLMs

Recent Activity

Reacted to merve's post with 👀 1 day ago

Small yet mighty! 💫 We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠 We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base https://huggingface.co/collections/HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39 Learn more from our blog here: huggingface.co/blog/smolvlm This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO 💝 Try the demo: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb Also TRL integration for DPO 💗

liked a model 1 day ago

HuggingFaceTB/SmolVLM-Instruct

updated a Space 1 day ago

maxiw/Qwen2-VL-Detection

View all activity

Organizations

Posts 6

Post

1545

You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")

Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

Post

1058

🤖 Controlling Computers with Small Models 🤖

We just released PTA-1, a fine-tuned Florence-2 for localization of GUI text and elements. It runs with ~150ms inference time on a RTX 4080. This means you can now start building fast on-device computer use agents!

Model: AskUI/PTA-1
Demo: AskUI/PTA-1

View all posts