distilbert-truncated
This model is a fine-tuned version of distilbert-base-uncased on the 20 Newsgroups dataset. It achieves the following results on the evaluation set:
Training and evaluation data
The data was split into training and testing: model trained on 90% of the data, and had a testing data size of 10% of the original dataset.
Training procedure
DistilBERT has a maximum input length of 512, so with this in mind the following was performed:
- I used the
distilbert-base-uncased
pretrained model to initialize anAutoTokenizer
. - Setting a maximum length of 256, each entry in the training, testing and validation data was truncated if it exceeded the limit and padded if it didn't reach the limit.
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 1908, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False}
- training_precision: float32
Training results
EPOCHS = 3 batches_per_epoch = 636 total_train_steps = 1908
Model accuracy 0.8337758779525757
Model loss 0.568471074104309
Framework versions
- Transformers 4.28.0
- TensorFlow 2.12.0
- Datasets 2.12.0
- Tokenizers 0.13.3
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.