update demo
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@ library_name: transformers
|
|
3 |
license: mit
|
4 |
pipeline_tag: image-text-to-text
|
5 |
---
|
6 |
-
📢 [[Project Page](https://microsoft.github.io/OmniParser/)] [[Blog Post](https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/)]
|
7 |
|
8 |
# Model Summary
|
9 |
OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent.
|
@@ -21,4 +21,7 @@ This model hub includes a finetuned version of YOLOv8 and a finetuned BLIP-2 mod
|
|
21 |
- While OmniParser only converts screenshot image into texts, it can be used to construct an GUI agent based on LLMs that is actionable. When developing and operating the agent using OmniParser, the developers need to be responsible and follow common safety standard.
|
22 |
- For OmniPaser-BLIP2, it may incorrectly infer the gender or other sensitive attribute (e.g., race, religion etc.) of individuals in icon images. Inference of sensitive attributes may rely upon stereotypes and generalizations rather than information about specific individuals and are more likely to be incorrect for marginalized people. Incorrect inferences may result in significant physical or psychological injury or restrict, infringe upon or undermine the ability to realize an individual’s human rights. We do not recommend use of OmniParser in any workplace-like use case scenario.
|
23 |
|
|
|
|
|
|
|
24 |
|
|
|
3 |
license: mit
|
4 |
pipeline_tag: image-text-to-text
|
5 |
---
|
6 |
+
📢 [[Project Page](https://microsoft.github.io/OmniParser/)] [[Blog Post](https://www.microsoft.com/en-us/research/articles/omniparser-for-pure-vision-based-gui-agent/)] [[Demo](https://huggingface.co/spaces/microsoft/OmniParser/)]
|
7 |
|
8 |
# Model Summary
|
9 |
OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent.
|
|
|
21 |
- While OmniParser only converts screenshot image into texts, it can be used to construct an GUI agent based on LLMs that is actionable. When developing and operating the agent using OmniParser, the developers need to be responsible and follow common safety standard.
|
22 |
- For OmniPaser-BLIP2, it may incorrectly infer the gender or other sensitive attribute (e.g., race, religion etc.) of individuals in icon images. Inference of sensitive attributes may rely upon stereotypes and generalizations rather than information about specific individuals and are more likely to be incorrect for marginalized people. Incorrect inferences may result in significant physical or psychological injury or restrict, infringe upon or undermine the ability to realize an individual’s human rights. We do not recommend use of OmniParser in any workplace-like use case scenario.
|
23 |
|
24 |
+
# License
|
25 |
+
Please note that icon_detect model is under AGPL license, and icon_caption_blip2 & icon_caption_florence is under MIT license. Please refer to the LICENSE file in the folder of each model.
|
26 |
+
|
27 |
|