|
--- |
|
language: en |
|
widget: |
|
- text: 'define "toecoin": toecoin rose by 200% after Elon Musk mentioned it in his tweet' |
|
datasets: |
|
- 'marksverdhei/wordnet-definitions-en-2021' |
|
--- |
|
|
|
# T5-define |
|
|
|
(This model is still a work in progress. If you use it for fine tuning, make sure to save a local copy) |
|
|
|
This model is trained to generate word definitions based on the word and a context, |
|
using a subset of wordnet for all words that have an example and definition. |
|
The model uses task prompts on the format 'define "[word]": [example sentence]' |
|
|
|
To my knowledge, this is the first public model trained on a word definition task. |
|
Similar work: [Zero-shot Word Sense Disambiguation using Sense Definition Embeddings](https://aclanthology.org/P19-1568.pdf) |
|
|
|
This model in particular is a on-shot learner for unseen words, as it has to infer the definition by only one example |
|
|
|
How to run: |
|
```python |
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
|
tokenizer = T5Tokenizer.from_pretrained("marksverdhei/t5-base-define") |
|
model = T5ForConditionalGeneration.from_pretrained("marksverdhei/t5-base-define") |
|
|
|
prompt = "define \"noseplow\": The children hid as the noseplow drove across the street" |
|
|
|
ids = tokenizer(prompt, return_tensors="pt").input_ids |
|
generated_tokens = model.generate(ids)[0][1:-1] |
|
print(tokenizer.decode(generated_tokens)) |
|
``` |
|
|
|
See the gist for the source code to used to train the model: |
|
|
|
https://gist.github.com/marksverdhei/0a13f67e65460b71c05fcf558a6a91ae |