maxiw commited on
Commit
8eb6285
1 Parent(s): dcbe50f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -12,9 +12,9 @@ base_model:
12
 
13
  # PTA-1: Controlling Computers with Small Models
14
 
15
- PTA (Prompt-to-Automation) is a vision language model for computer use applications based on Florence-2.
16
- With less than 300M parameters it beats larger models in GUI text and element localization.
17
- This allows low latency computer automations with local execution.
18
 
19
  **Model Input:** Screenshot + description_of_target_element
20
 
@@ -62,8 +62,8 @@ print(parsed_answer)
62
 
63
  ## Evaluation
64
 
65
- **Note:** This is a first version of our evaluation with 999 samples (333 samples from each dataset).
66
- We are still running all models on the full test sets. We are seeing +-5% deviations for a subset of the models we have already evaluated.
67
 
68
  | Model | Parameters | Mean | agentsea/wave-ui | AskUI/pta-text | ivelin/rico_refexp_combined |
69
  |--------------------------------------------|------------|--------|------------------|----------------|-----------------------------|
@@ -83,10 +83,10 @@ We are still running all models on the full test sets. We are seeing +-5% deviat
83
  \* Models is known to be trained on the train split of that dataset.
84
 
85
  The high benchmark scores for our model are partially due to data bias.
86
- Therefore we expect users of the model to fine-tune it according to the data distributions of their use case.
87
 
88
 
89
  #### Metrics
90
 
91
- Click success rate is calculated as the number of clicks inside the target bounding box.
92
  If a model predicts a target bounding box instead of a click coordinate, its center is used as its click prediction.
 
12
 
13
  # PTA-1: Controlling Computers with Small Models
14
 
15
+ PTA (Prompt-to-Automation) is a vision language model for computer & phone automation, based on Florence-2.
16
+ With only 270M parameters it outperforms much larger models in GUI text and element localization.
17
+ This enables low-latency computer automation with local execution.
18
 
19
  **Model Input:** Screenshot + description_of_target_element
20
 
 
62
 
63
  ## Evaluation
64
 
65
+ **Note:** This is a first version of our evaluation, based on 999 samples (333 samples from each dataset).
66
+ We are still running all models on the full test sets, and we are seeing ±5% deviations for a subset of the models we have already evaluated.
67
 
68
  | Model | Parameters | Mean | agentsea/wave-ui | AskUI/pta-text | ivelin/rico_refexp_combined |
69
  |--------------------------------------------|------------|--------|------------------|----------------|-----------------------------|
 
83
  \* Models is known to be trained on the train split of that dataset.
84
 
85
  The high benchmark scores for our model are partially due to data bias.
86
+ Therefore, we expect users of the model to fine-tune it according to the data distributions of their use case.
87
 
88
 
89
  #### Metrics
90
 
91
+ Click success rate is calculated as the number of clicks inside the target bounding box relative to all clicks.
92
  If a model predicts a target bounding box instead of a click coordinate, its center is used as its click prediction.