ojobert / README.md
ihk's picture
Update README.md
7add91c
metadata
base_model: distilbert-base-uncased
model-index:
  - name: ojobert
    results: []
license: mit
language:
  - en
widget:
  - text: Would you like to join a major [MASK] company?
tags:
  - jobs

Nesta, the UK's innovation agency, has been scraping online job adverts since 2021 and building algorithms to extract and structure information as part of the Open Jobs Observatory project.

Although we are unable to share the raw data openly, we aim to open source our models, algorithms and tools so that anyone can use them for their own research and analysis.

πŸ“Ÿ About

This model is pre-trained from a distilbert-base-uncased checkpoint on 100k sentences from scraped online job postings as part of the Open Jobs Observatory.

πŸ–¨οΈ Use

To use the model:

from transformers import pipeline

model = pipeline('fill-mask', model='ihk/ojobert', tokenizer='ihk/ojobert')

An example use is as follows:


text = "Would you like to join a major [MASK] company?"
results = model(text, top_k=3)

results

>> [{'score': 0.1886572688817978,
  'token': 13859,
  'token_str': 'pharmaceutical',
  'sequence': 'would you like to join a major pharmaceutical company?'},
 {'score': 0.07436735928058624,
  'token': 5427,
  'token_str': 'insurance',
  'sequence': 'would you like to join a major insurance company?'},
 {'score': 0.06400047987699509,
  'token': 2810,
  'token_str': 'construction',
  'sequence': 'would you like to join a major construction company?'}]

βš–οΈ Training results

The fine-tuning metrics are as follows:

  • eval_loss: 2.5871026515960693
  • eval_runtime: 134.4452
  • eval_samples_per_second: 14.281
  • eval_steps_per_second: 0.223
  • epoch: 3.0
  • perplexity: 13.29