Edit model card

FLAN-T5 small-WordNet

This model is a fine-tuned version of flan-t5-small on the WordNet dataset.

Model description

The model is trained to classify terms into one of four term types: noun, verb, adjective or adverb. The types themselves are learned and then generated by the model with no more than one type associated with a specific term.

The model also works well as part of a Retrieval-and-Generation (RAG) pipeline by leveraging an external knowledge source, specifically Wordnet Semantic Primes.

Intended uses and limitations

This model is intended to be used to generate a type (class) for an input term.

Training and evaluation data

The training and evaluation data can be found here.

The train size is 40559.

The test size is 9470.

Example

Here's an example of the model capabilities:

  • input:

    • Lexical Term L: question
    • Sentence Containing L (Optional): there was a question about my training
  • output:

    • Type: noun
  • input:

    • Lexical Term L: lodge
    • Sentence Containing L (Optional): Where are you lodging in Paris?
  • output:

    • Type: verb
  • input:

    • Lexical Term L: genus equisetum
    • Sentence Containing L (Optional):
  • output:

    • Type: noun

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss
0.1725 1.0 1000 0.0640
0.1250 2.0 2000 0.0535
0.1040 3.0 3000 0.0469
0.0917 4.0 4000 0.0421
0.0830 5.0 5000 0.0384
@misc{akl2024dstillms4ol2024task,
      title={DSTI at LLMs4OL 2024 Task A: Intrinsic versus extrinsic knowledge for type classification}, 
      author={Hanna Abi Akl},
      year={2024},
      eprint={2408.14236},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.14236}, 
}
Downloads last month
13
Safetensors
Model size
77M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.