Update README.md
Browse files
README.md
CHANGED
@@ -12,9 +12,9 @@ base_model:
|
|
12 |
|
13 |
# PTA-1: Controlling Computers with Small Models
|
14 |
|
15 |
-
PTA (Prompt-to-Automation) is a vision language model for computer
|
16 |
-
With
|
17 |
-
This
|
18 |
|
19 |
**Model Input:** Screenshot + description_of_target_element
|
20 |
|
@@ -62,8 +62,8 @@ print(parsed_answer)
|
|
62 |
|
63 |
## Evaluation
|
64 |
|
65 |
-
**Note:** This is a first version of our evaluation
|
66 |
-
We are still running all models on the full test sets
|
67 |
|
68 |
| Model | Parameters | Mean | agentsea/wave-ui | AskUI/pta-text | ivelin/rico_refexp_combined |
|
69 |
|--------------------------------------------|------------|--------|------------------|----------------|-----------------------------|
|
@@ -83,10 +83,10 @@ We are still running all models on the full test sets. We are seeing +-5% deviat
|
|
83 |
\* Models is known to be trained on the train split of that dataset.
|
84 |
|
85 |
The high benchmark scores for our model are partially due to data bias.
|
86 |
-
Therefore we expect users of the model to fine-tune it according to the data distributions of their use case.
|
87 |
|
88 |
|
89 |
#### Metrics
|
90 |
|
91 |
-
Click success rate is calculated as the number of clicks inside the target bounding box.
|
92 |
If a model predicts a target bounding box instead of a click coordinate, its center is used as its click prediction.
|
|
|
12 |
|
13 |
# PTA-1: Controlling Computers with Small Models
|
14 |
|
15 |
+
PTA (Prompt-to-Automation) is a vision language model for computer & phone automation, based on Florence-2.
|
16 |
+
With only 270M parameters it outperforms much larger models in GUI text and element localization.
|
17 |
+
This enables low-latency computer automation with local execution.
|
18 |
|
19 |
**Model Input:** Screenshot + description_of_target_element
|
20 |
|
|
|
62 |
|
63 |
## Evaluation
|
64 |
|
65 |
+
**Note:** This is a first version of our evaluation, based on 999 samples (333 samples from each dataset).
|
66 |
+
We are still running all models on the full test sets, and we are seeing ±5% deviations for a subset of the models we have already evaluated.
|
67 |
|
68 |
| Model | Parameters | Mean | agentsea/wave-ui | AskUI/pta-text | ivelin/rico_refexp_combined |
|
69 |
|--------------------------------------------|------------|--------|------------------|----------------|-----------------------------|
|
|
|
83 |
\* Models is known to be trained on the train split of that dataset.
|
84 |
|
85 |
The high benchmark scores for our model are partially due to data bias.
|
86 |
+
Therefore, we expect users of the model to fine-tune it according to the data distributions of their use case.
|
87 |
|
88 |
|
89 |
#### Metrics
|
90 |
|
91 |
+
Click success rate is calculated as the number of clicks inside the target bounding box relative to all clicks.
|
92 |
If a model predicts a target bounding box instead of a click coordinate, its center is used as its click prediction.
|