Maximofn
/

Florence-2-finetuned-HuggingFaceM4-DocumentVQA

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

Florence-2-finetuned-HuggingFaceM4-DocumentVQA / README.md

Maximofn's picture

Create README.md

f28f19c verified 4 months ago

|

history blame contribute delete

1.97 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceM4/DocumentVQA
	language:
	- en
	library_name: transformers
	pipeline_tag: image-text-to-text
	---

	# Florence-2-finetuned-HuggingFaceM4-DOcumentVQA

	This model is a fine-tuned version of [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) on [HuggingFaceM4/DocumentVQA](https://huggingface.co/datasets/HuggingFaceM4/DocumentVQA) dataset.

	It is the result of the post [Fine tuning Florence-2](https://maximofn.com/fine-tuning-florence-2/)

	It achieves the following results on the evaluation set:
	- Loss: 0.7168

	## Model description

	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.

	He has also been finetuned in the docVQA task.

	## Training and evaluation data

	This is finetuned on [HuggingFaceM4/DocumentVQA](https://huggingface.co/datasets/HuggingFaceM4/DocumentVQA) dataset.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-6
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Validation Loss \|
	\|:-------------:\|:-----:\|:---------------:\|
	\| 1.1535 \| 1.0 \| 0.7698 \|
	\| 0.6530 \| 2.0 \| 0.7253 \|
	\| 0.5878 \| 3.0 \| 0.7168 \|


	### Framework versions

	- Transformers 4.43.3
	- Pytorch 2.3.1+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1