GeoBERT
GeoBERT is a NER model that was fine-tuned from SciBERT on the Geoscientific Corpus dataset. The model was trained on the Labeled Geoscientific Corpus dataset (~1 million sentences).
Intended uses
The NER product in this model has a goal to identify four main semantic types or categories related to Geosciences.
- GeoPetro for any entities that belong to all terms in Geosciences
- GeoMeth for all tools or methods associated with Geosciences
- GeoLoc to identify geological locations
- GeoTime for identifying the geological time scale entities
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 14000, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: mixed_float16
Framework versions
- Transformers 4.22.1
- TensorFlow 2.10.0
- Datasets 2.4.0
- Tokenizers 0.12.1
Model performances (metric: seqeval)
entity | precision | recall | f1 |
---|---|---|---|
GeoLoc | 0.9727 | 0.9591 | 0.9658 |
GeoMeth | 0.9433 | 0.9447 | 0.9445 |
GeoPetro | 0.9767 | 0.9745 | 0.9756 |
GeoTime | 0.9695 | 0.9666 | 0.9680 |
How to use GeoBERT with HuggingFace
Load GeoBERT and its sub-word tokenizer :
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("botryan96/GeoBERT")
model = AutoModelForTokenClassification.from_pretrained("botryan96/GeoBERT")
#Define the pipeline
from transformers import pipeline
ner_machine = pipeline('ner',model = models,tokenizer=tokenizer, aggregation_strategy="simple")
#Define the sentence
sentence = 'In North America, the water storage in the seepage face model is higher than the base case because positive pore pressure is requisite for drainage through a seepage face boundary condition. The result from the resistivity data supports the notion, especially in the northern part of the Sandstone Sediment formation. The active formation of America has a big potential for Oil and Gas based on the seismic section, has been activated since the Paleozoic'
#Deploy the NER Machine
ner_machine(sentence)
- Downloads last month
- 185
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.