AdamCodd's picture
Update README.md
12c599c
|
raw
history blame
3.61 kB
metadata
metrics:
  - mse
  - r_squared
  - mae
datasets:
  - google_wellformed_query
model-index:
  - name: distilroberta-query-wellformedness
    results:
      - task:
          type: text-classification
          name: Text Classification
        metrics:
          - type: loss
            value: 0.06214376166462898
          - type: mse
            value: 0.06214376166462898
            name: Validation Mean Squared Error
          - type: r2
            value: 0.5705611109733582
            name: Validation R-Squared
          - type: mae
            value: 0.1838676631450653
            name: Validation Mean Absolute Error
language:
  - en

DistilRoBERTa-query-wellformedness

This model utilizes the Distilroberta base architecture, which has been fine-tuned for a regression task on the Google's query wellformedness dataset encompassing 25,100 queries from the Paralex corpus. Each query received annotations from five raters, who provided a continuous rating indicating the degree to which the query is well-formed.

Model description

The model evaluates the query for completeness and grammatical correctness, providing a score between 0 and 1, where 1 indicates correctness.

Usage

# Sentences
sentences = [
    "The cat and dog in the yard.",  # Incorrect - It should be "The cat and dog are in the yard."
    "she don't like apples.",  # Incorrect - It should be "She doesn't like apples."
    "Is rain sunny days sometimes?",  # Incorrect - It should be "Do sunny days sometimes have rain?"
    "She enjoys reading books and playing chess.",  # Correct
    "How many planets are there in our solar system?"  # Correct
]

# Tokenizing the sentences
inputs = tokenizer(sentences, truncation=True, padding=True, return_tensors='pt')

# Getting the model's predictions
with torch.no_grad():  # Disabling gradient calculation as we are only doing inference
    model.eval()  # Setting the model to evaluation mode
    predicted_ratings = model(
        input_ids=inputs['input_ids'], 
        attention_mask=inputs['attention_mask']
    )

# The predicted_ratings is a tensor, so we'll convert it to a list of standard Python numbers
predicted_ratings = predicted_ratings.squeeze().tolist()

# Printing the predicted ratings
for i, rating in enumerate(predicted_ratings):
    print(f'Sentence: {sentences[i]}')
    print(f'Predicted Rating: {rating}\n')

Output:

Sentence: The cat and dog in the yard.
Predicted Rating: 0.3482873737812042

Sentence: she don't like apples.
Predicted Rating: 0.07787154614925385

Sentence: Is rain sunny days sometimes?
Predicted Rating: 0.19854165613651276

Sentence: She enjoys reading books and playing chess.
Predicted Rating: 0.9327691793441772

Sentence: How many planets are there in our solar system?
Predicted Rating: 0.9746372103691101

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 450
  • num_epochs: 5

Training results

Metrics: Mean Squared Error, R-Squared, Mean Absolute Error

'test_loss': 0.06214376166462898,
'test_mse': 0.06214376166462898,
'test_r2': 0.5705611109733582,
'test_mae': 0.1838676631450653

Framework versions

  • Transformers 4.34.1
  • Pytorch lightning 2.1.0
  • Tokenizers 0.14.1

If you want to support me, you can here.