ihk/ojobert · Hugging Face

Nesta, the UK's innovation agency, has been scraping online job adverts since 2021 and building algorithms to extract and structure information as part of the Open Jobs Observatory project.

Although we are unable to share the raw data openly, we aim to open source our models, algorithms and tools so that anyone can use them for their own research and analysis.

📟 About

This model is pre-trained from a distilbert-base-uncased checkpoint on 100k sentences from scraped online job postings as part of the Open Jobs Observatory.

🖨️ Use

To use the model:

from transformers import pipeline

model = pipeline('fill-mask', model='ihk/ojobert', tokenizer='ihk/ojobert')

An example use is as follows:


text = "Would you like to join a major [MASK] company?"
results = model(text, top_k=3)

results

>> [{'score': 0.1886572688817978,
  'token': 13859,
  'token_str': 'pharmaceutical',
  'sequence': 'would you like to join a major pharmaceutical company?'},
 {'score': 0.07436735928058624,
  'token': 5427,
  'token_str': 'insurance',
  'sequence': 'would you like to join a major insurance company?'},
 {'score': 0.06400047987699509,
  'token': 2810,
  'token_str': 'construction',
  'sequence': 'would you like to join a major construction company?'}]

⚖️ Training results

The fine-tuning metrics are as follows:

eval_loss: 2.5871026515960693
eval_runtime: 134.4452
eval_samples_per_second: 14.281
eval_steps_per_second: 0.223
epoch: 3.0
perplexity: 13.29

ihk
/

ojobert

📟 About

🖨️ Use

⚖️ Training results

Model tree for ihk/ojobert

Evaluation results