hipnologo commited on
Commit
69d2293
1 Parent(s): 4959b7c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -2
README.md CHANGED
@@ -10,6 +10,7 @@ tags:
10
  - gpt2
11
  - sentiment-analysis
12
  - fine-tuned
 
13
  ---
14
 
15
  # Fine-tuned GPT-2 Model for IMDb Movie Review Sentiment Analysis
@@ -52,6 +53,31 @@ print(f"The sentiment predicted by the model is: {'Positive' if predicted_class
52
  ## Training Procedure
53
  The model was trained using the 'Trainer' class from the transformers library, with a learning rate of 2e-5, batch size of 1, and 3 training epochs.
54
 
55
- ## Fine-tuning Details
56
- The model was fine-tuned using the IMDb movie review dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  - gpt2
11
  - sentiment-analysis
12
  - fine-tuned
13
+ license: mit
14
  ---
15
 
16
  # Fine-tuned GPT-2 Model for IMDb Movie Review Sentiment Analysis
 
53
  ## Training Procedure
54
  The model was trained using the 'Trainer' class from the transformers library, with a learning rate of 2e-5, batch size of 1, and 3 training epochs.
55
 
56
+ ## Evaluation
57
+
58
+ The fine-tuned model was evaluated on the test dataset. Here are the results:
59
+
60
+ - **Evaluation Loss**: 0.23127
61
+ - **Evaluation Accuracy**: 0.94064
62
+ - **Evaluation F1 Score**: 0.94104
63
+ - **Evaluation Precision**: 0.93466
64
+ - **Evaluation Recall**: 0.94752
65
+
66
+ The evaluation metrics suggest that the model has a high accuracy and good precision-recall balance for the task of sentiment classification.
67
+
68
+ ### How to Reproduce
69
+
70
+ The evaluation results can be reproduced by loading the model and the tokenizer from Hugging Face Model Hub and then running the model on the evaluation dataset using the `Trainer` class from the Transformers library, with the `compute_metrics` function defined as above.
71
 
72
+ The evaluation loss is the cross-entropy loss of the model on the evaluation dataset, a measure of how well the model's predictions match the actual labels. The closer this is to zero, the better.
73
+
74
+ The evaluation accuracy is the proportion of predictions the model got right. This number is between 0 and 1, with 1 meaning the model got all predictions right.
75
+
76
+ The F1 score is a measure of a test's accuracy that considers both precision (the number of true positive results divided by the number of all positive results) and recall (the number of true positive results divided by the number of all samples that should have been identified as positive). An F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.
77
+
78
+ The evaluation precision is how many of the positively classified were actually positive. The closer this is to 1, the better.
79
+
80
+ The evaluation recall is how many of the actual positives our model captured through labeling it as positive. The closer this is to 1, the better.
81
+
82
+ ## Fine-tuning Details
83
+ The model was fine-tuned using the IMDb movie review dataset.