metadata

language: en
license: afl-3.0
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: covid-twitter-bert-v2-struth
    results: []
widget:
  - text: COVID vaccines can prevent serious illness and death from COVID-19
    example_title: Real Tweet
  - text: >-
      COVID vaccines are not effective at protecting you from serious illness
      and death from COVID-19
    example_title: Fake Tweet

covid-twitter-bert-v2-struth

This model is a fine-tuned version of digitalepidemiologylab/covid-twitter-bert-v2 on the COVID-19 Fake News Dataset NLP by Elvin Aghammadzada. It achieves the following results on the evaluation set:

Loss: 0.1171
Accuracy: 0.9662
Precision: 0.9813
Recall: 0.9493
F1: 0.9650

Model description

This model is built on the work on Digital Epidemiology Lab and their COVID Twitter BERT model. We have extended their model by training it for Sequence Classification tasks. This is part of a wider project for True/Fake news by the Struth Social Team.

Intended uses & limitations

This model is intended to be used for the classification of Tweets as either true or fake (0 or 1). The model can also be used for relatively complex statements regarding COVID-19.

A known limitation of this model is basic statements (e.g. COVID is a hoax) as the Tweets used to train the model are not simplistic in nature.

Training and evaluation data

Training and Testing data was split 80:20 for the results listed above.

Training/Testing Set:

Samples Total: 8437
Samples Train: 6749
Samples Test: 1687

Evaluation Set:

Samples Total: 100

Training procedure

Data is preprocessed through custom scripts
Data is passed to the model training script
Training is conducted
Best model is retrieved at end of training and uploaded to the Hub

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.1719	1.0	422	0.1171	0.9662	0.9813	0.9493	0.9650
0.0565	2.0	844	0.1595	0.9621	0.9421	0.9831	0.9622
0.0221	3.0	1266	0.2059	0.9585	0.9859	0.9287	0.9565
0.009	4.0	1688	0.1378	0.9722	0.9600	0.9843	0.9720
0.0021	5.0	2110	0.2013	0.9722	0.9863	0.9565	0.9712
0.0069	6.0	2532	0.2894	0.9615	0.9948	0.9263	0.9593
0.0054	7.0	2954	0.2692	0.9650	0.9949	0.9336	0.9632
0.0058	8.0	3376	0.2406	0.9639	0.9776	0.9481	0.9626
0.0017	9.0	3798	0.1877	0.9722	0.9654	0.9783	0.9718
0.0019	10.0	4220	0.2761	0.9686	0.9850	0.9505	0.9674
0.007	11.0	4642	0.1889	0.9722	0.9875	0.9553	0.9711
0.0007	12.0	5064	0.2774	0.9662	0.9837	0.9469	0.9649
0.0008	13.0	5486	0.2344	0.9722	0.9791	0.9638	0.9714
0.0	14.0	5908	0.2768	0.9662	0.9789	0.9517	0.9651
0.0	15.0	6330	0.2798	0.9662	0.9789	0.9517	0.9651
0.0	16.0	6752	0.2790	0.9668	0.9789	0.9529	0.9657
0.0	17.0	7174	0.2850	0.9668	0.9789	0.9529	0.9657
0.0	18.0	7596	0.2837	0.9668	0.9789	0.9529	0.9657
0.0	19.0	8018	0.2835	0.9674	0.9789	0.9541	0.9664
0.0	20.0	8440	0.2842	0.9674	0.9789	0.9541	0.9664

Framework versions

Transformers 4.22.2
Pytorch 1.12.1+cu113
Datasets 2.5.1
Tokenizers 0.12.1