File size: 1,336 Bytes
1bcf43c
 
 
 
 
 
25db008
 
1bcf43c
 
25db008
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
language: en
widget:
 - text: 'define \"dread\": The overwhelming amount of filled him with dread'
---

# T5-define  

(This model is still a work in progress. If you use it for fine tuning, make sure to save a local copy)

This model is trained to generate word definitions based on the word and a context,
using a subset of wordnet for all words that have an example and definition.
The model uses task prompts on the format 'define "[word]": [example sentence]'

To my knowledge, this is the first public model trained on a word definition task.
Similar work: [Zero-shot Word Sense Disambiguation using Sense Definition Embeddings](https://aclanthology.org/P19-1568.pdf)

For this project, there are two objectives:
1. Explore generalizability on generating word definitions for unseen words
2. Explore the utility of word embeddings by definition models

How to run:
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

tokenizer = T5Tokenizer.from_pretrained("marksverdhei/t5-base-define")
model = T5ForConditionalGeneration.from_pretrained("marksverdhei/t5-base-define")

prompt = "define \"noseplow\": The children hid as the noseplow drove across the street"

ids = tokenizer(prompt, return_tensors="pt").input_ids
generated_tokens = model.generate(ids)[0][1:-1]
tokenizer.decode(generated_tokens)
```