unikei's picture
Task + how to run
4d2aa5a
|
raw
history blame
2.16 kB
metadata
license: bigscience-openrail-m
tags:
  - split and rephrase

T5 model for splitting complex sentences to simple sentences in English

Split-and-rephrase is the task of splitting a complex input sentence into shorter sentences while preserving meaning. (Narayan et al., 2017)

E.g.:

Cystic Fibrosis (CF) is an autosomal recessive disorder that affects multiple organs,
which is common in the Caucasian population, symptomatically affecting 1 in 2500 newborns in the UK,
and more than 80,000 individuals globally.

could be split into

Cystic Fibrosis is an autosomal recessive disorder that affects multiple organs. 
Cystic Fibrosis is common in the Caucasian population.
Cystic Fibrosis affects 1 in 2500 newborns in the UK. 
Cystic Fibrosis affects more than 80,000 individuals globally.

How to use it in your code:

from transformers import T5Tokenizer, T5ForConditionalGeneration
checkpoint="unikei/t5-base-split-and-rephrase"
tokenizer = T5Tokenizer.from_pretrained(checkpoint)
model = T5ForConditionalGeneration.from_pretrained(checkpoint)

complex_sentence = "Cystic Fibrosis (CF) is an autosomal recessive disorder that \
affects multiple organs, which is common in the Caucasian \
population, symptomatically affecting 1 in 2500 newborns in \
the UK, and more than 80,000 individuals globally."
complex_tokenized = tokenizer(complex_sentence, 
                                 padding="max_length", 
                                 truncation=True,
                                 max_length=256, 
                                 return_tensors='pt')

simple_tokenized = model.generate(complex_tokenized['input_ids'], attention_mask = complex_tokenized['attention_mask'], max_length=256, num_beams=5)
simple_sentences = tokenizer.batch_decode(simple_tokenized, skip_special_tokens=True)
print(simple_sentences)

"""
Output:
Cystic Fibrosis is an autosomal recessive disorder that affects multiple organs. Cystic Fibrosis affects 1 in 2500 newborns in the UK. Cystic Fibrosis affects more than 80,000 individuals globally. Cystic Fibrosis is common in the Caucasian population.
"""