bastao's picture
Update README.md
861d48d verified
metadata
license: mit
datasets:
  - LemeExploreNau/VeraCruz
language:
  - pt
metrics:
  - accuracy
tags:
  - Portuguese
  - Brazilian
  - Language Classification

PeroVazPT-BR Classifier

Model Description

The PeroVazPT-BR Classifier is designed to classify text between European Portuguese (PT) and Brazilian Portuguese (BR).

This model is a fine-tuned version of prajjwal1/bert-tiny on the VeraCruz Dataset. The model was trained on the VeraCruz Dataset, a collection of text samples from both languages. The model was trained on a total of 500,000 examples, a evenly split between European Portuguese and Brazilian Portuguese, ensuring a balanced representation of both language variants.

It achieves the following results on an evaluation set of 50,000 examples:

  • Loss: 0.1791
  • Accuracy: 0.9461

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 256
  • eval_batch_size: 256
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • steps: 2500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.4772 0.06 500 0.2501 0.9080
0.3412 0.13 1000 0.2275 0.9135
0.3122 0.19 1500 0.2578 0.9014
0.2975 0.25 2000 0.1992 0.9396
0.2877 0.31 2500 0.1791 0.9461

Framework versions

  • Transformers 4.40.0.dev0
  • Pytorch 2.2.1
  • Datasets 2.18.0
  • Tokenizers 0.15.2