PyTorch
English
monkey
custom_code
EDGEwww25 commited on
Commit
3f91224
1 Parent(s): cc0dff4

update the model card

Browse files
Files changed (1) hide show
  1. README.md +23 -12
README.md CHANGED
@@ -1,12 +1,23 @@
1
- This is the model repository of paper *EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data*.
2
-
3
-
4
- The model is fine-tuned based on [*Monkey*](https://github.com/Yuliang-Liu/Monkey). In order to speed up the training, we also made some minor modifications:
5
- 1. Instead of using the Lora Adapters in *Monkey*, the five patches of the raw image are stacked in an extra batch dimension and sent to the image encoder for processing at the same time.
6
- 2. Inside the image encoder, we use [*flash attention*](https://github.com/Dao-AILab/flash-attention) instead of the manually implemented attention.
7
- 3. Separate the step of reading the image from the forward propagation and make it a step of dataset preprocessing to speed up image reading using the `Dataloader` in pytorch.
8
-
9
-
10
- The training dataset (i.e. all training QAs in `.jsonl` format, excluding images) is published in repository [*EDGE-Dataset*](https://huggingface.co/datasets/EDGEwww25/EDGE-Dataset/settings).
11
-
12
- The model training and inference scripts are published in anonymous repository [*EDGE*](https://anonymous.4open.science/r/EDGE-1CDB).
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - EDGEwww25/EDGE-Dataset
5
+ - liuhaotian/LLaVA-Instruct-150K
6
+ - echo840/Monkey_Data
7
+ language:
8
+ - en
9
+ base_model:
10
+ - echo840/Monkey-Chat
11
+ ---
12
+ This is the model repository of paper *EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data*.
13
+
14
+
15
+ The model is fine-tuned based on [*Monkey*](https://github.com/Yuliang-Liu/Monkey). In order to speed up the training, we also made some minor modifications:
16
+ 1. Instead of using the Lora Adapters in *Monkey*, the five patches of the raw image are stacked in an extra batch dimension and sent to the image encoder for processing at the same time.
17
+ 2. Inside the image encoder, we use [*flash attention*](https://github.com/Dao-AILab/flash-attention) instead of the manually implemented attention.
18
+ 3. Separate the step of reading the image from the forward propagation and make it a step of dataset preprocessing to speed up image reading using the `Dataloader` in pytorch.
19
+
20
+
21
+ The training dataset (i.e. all training QAs in `.jsonl` format, excluding images) is published in repository [*EDGE-Dataset*](https://huggingface.co/datasets/EDGEwww25/EDGE-Dataset).
22
+
23
+ The model training and inference scripts are published in anonymous repository [*EDGE*](https://anonymous.4open.science/r/EDGE-1CDB).