Edit model card

GeoBERT_Analyzer

GeoBERT_Analyzer is a Text Classification model that was fine-tuned from GeoBERT on the Geoscientific Corpus dataset. The model was trained on the Labeled Geoscientific & Non-Geosceintific Corpus dataset (21416 x 2 sentences).

Intended uses

The train aims to make the Language Model have the ability to distinguish between Geoscience and Non – Geoscience (General) corpus

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 14000, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
  • training_precision: mixed_float16

Framework versions

  • Transformers 4.22.1
  • TensorFlow 2.10.0
  • Datasets 2.4.0
  • Tokenizers 0.12.1

Model performances (metric: seqeval)

entity precision recall f1
General 0.9976 0.9980 0.9978
Geoscience 0.9980 0.9984 0.9982

How to use GeoBERT with HuggingFace

Load GeoBERT and its sub-word tokenizer :
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("botryan96/GeoBERT_analyzer")
model = AutoModelForTokenClassification.from_pretrained("botryan96/GeoBERT_analyzer")

#Define the pipeline
from transformers import pipeline
anlyze_machine=pipeline('text-classification',model = model_checkpoint2)

#Define the sentences
sentences = ['the average iron and sulfate concentrations were calculated to be 19 . 6 5 . 2 and 426 182 mg / l , respectively .',
            'She first gained media attention as a friend and stylist of Paris Hilton']

#Deploy the machine
anlyze_machine(sentences)
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.