Maximofn commited on
Commit
f28f19c
1 Parent(s): 1152724

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - HuggingFaceM4/DocumentVQA
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: image-text-to-text
9
+ ---
10
+
11
+ # Florence-2-finetuned-HuggingFaceM4-DOcumentVQA
12
+
13
+ This model is a fine-tuned version of [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) on [HuggingFaceM4/DocumentVQA](https://huggingface.co/datasets/HuggingFaceM4/DocumentVQA) dataset.
14
+
15
+ It is the result of the post [Fine tuning Florence-2](https://maximofn.com/fine-tuning-florence-2/)
16
+
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.7168
19
+
20
+ ## Model description
21
+
22
+ Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.
23
+
24
+ He has also been finetuned in the docVQA task.
25
+
26
+ ## Training and evaluation data
27
+
28
+ This is finetuned on [HuggingFaceM4/DocumentVQA](https://huggingface.co/datasets/HuggingFaceM4/DocumentVQA) dataset.
29
+
30
+ ## Training procedure
31
+
32
+ ### Training hyperparameters
33
+
34
+ The following hyperparameters were used during training:
35
+ - learning_rate: 1e-6
36
+ - train_batch_size: 8
37
+ - eval_batch_size: 8
38
+ - seed: 42
39
+ - optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
40
+ - num_epochs: 3
41
+
42
+ ### Training results
43
+
44
+ | Training Loss | Epoch | Validation Loss |
45
+ |:-------------:|:-----:|:---------------:|
46
+ | 1.1535 | 1.0 | 0.7698 |
47
+ | 0.6530 | 2.0 | 0.7253 |
48
+ | 0.5878 | 3.0 | 0.7168 |
49
+
50
+
51
+ ### Framework versions
52
+
53
+ - Transformers 4.43.3
54
+ - Pytorch 2.3.1+cu121
55
+ - Datasets 2.20.0
56
+ - Tokenizers 0.19.1