bastao
/

PeroVaz_PT-BR_Classifier

Text Classification

Language Classification

Inference Endpoints

Model card Files Files and versions Community

PeroVaz_PT-BR_Classifier / README.md

bastao's picture

Update README.md

861d48d verified 8 months ago

|

history blame contribute delete

2.02 kB

	---
	license: mit
	datasets:
	- LemeExploreNau/VeraCruz
	language:
	- pt
	metrics:
	- accuracy
	tags:
	- Portuguese
	- Brazilian
	- Language Classification
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# PeroVazPT-BR Classifier

	## Model Description
	The PeroVazPT-BR Classifier is designed to classify text between European Portuguese (PT) and Brazilian Portuguese (BR).

	This model is a fine-tuned version of [prajjwal1/bert-tiny](https://huggingface.co/prajjwal1/bert-tiny) on the [VeraCruz Dataset](https://huggingface.co/datasets/LemeExploreNau/VeraCruz).
	The model was trained on the [VeraCruz Dataset](https://huggingface.co/datasets/LemeExploreNau/VeraCruz), a collection of text samples from both languages. The model was trained on a total of 500,000 examples, a evenly split between European Portuguese and Brazilian Portuguese, ensuring a balanced representation of both language variants.

	It achieves the following results on an evaluation set of 50,000 examples:
	- Loss: 0.1791
	- Accuracy: 0.9461

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 256
	- eval_batch_size: 256
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- steps: 2500
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 0.4772 \| 0.06 \| 500 \| 0.2501 \| 0.9080 \|
	\| 0.3412 \| 0.13 \| 1000 \| 0.2275 \| 0.9135 \|
	\| 0.3122 \| 0.19 \| 1500 \| 0.2578 \| 0.9014 \|
	\| 0.2975 \| 0.25 \| 2000 \| 0.1992 \| 0.9396 \|
	\| 0.2877 \| 0.31 \| 2500 \| 0.1791 \| 0.9461 \|

	### Framework versions

	- Transformers 4.40.0.dev0
	- Pytorch 2.2.1
	- Datasets 2.18.0
	- Tokenizers 0.15.2