led-large-book-summary

This model is a fine-tuned version of allenai/led-large-16384 on the BookSum dataset (kmfoda/booksum). It aims to generalize well and be useful in summarizing lengthy text for both academic and everyday purposes.

Handles up to 16,384 tokens input
See the Colab demo linked above or try the demo on Spaces

Note: Due to inference API timeout constraints, outputs may be truncated before the fully summary is returned (try python or the demo)

Basic Usage

To improve summary quality, use encoder_no_repeat_ngram_size=3 when calling the pipeline object. This setting encourages the model to utilize new vocabulary and construct an abstractive summary.

Load the model into a pipeline object:

import torch
from transformers import pipeline

hf_name = 'pszemraj/led-large-book-summary'

summarizer = pipeline(
    "summarization",
    hf_name,
    device=0 if torch.cuda.is_available() else -1,
)

Feed the text into the pipeline object:

wall_of_text = "your words here"

result = summarizer(
    wall_of_text,
    min_length=16,
    max_length=256,
    no_repeat_ngram_size=3,
    encoder_no_repeat_ngram_size=3,
    repetition_penalty=3.5,
    num_beams=4,
    early_stopping=True,
)

Important: For optimal summary quality, use the global attention mask when decoding, as demonstrated in this community notebook, see the definition of generate_answer(batch).

If you're facing computing constraints, consider using the base version pszemraj/led-base-book-summary.

Training Information

Data

The model was fine-tuned on the booksum dataset. During training, the chapterwas the input col, while the summary_text was the output.

Procedure

Fine-tuning was run on the BookSum dataset across 13+ epochs. Notably, the final four epochs combined the training and validation sets as 'train' to enhance generalization.

Hyperparameters

The training process involved different settings across stages:

Initial Three Epochs: Low learning rate (5e-05), batch size of 1, 4 gradient accumulation steps, and a linear learning rate scheduler.
In-between Epochs: Learning rate reduced to 4e-05, increased batch size to 2, 16 gradient accumulation steps, and switched to a cosine learning rate scheduler with a 0.05 warmup ratio.
Final Two Epochs: Further reduced learning rate (2e-05), batch size reverted to 1, maintained gradient accumulation steps at 16, and continued with a cosine learning rate scheduler, albeit with a lower warmup ratio (0.03).

Versions

Transformers 4.19.2
Pytorch 1.11.0+cu113
Datasets 2.2.2
Tokenizers 0.12.1

Simplified Usage with TextSum

To streamline the process of using this and other models, I've developed a Python package utility named textsum. This package offers simple interfaces for applying summarization models to text documents of arbitrary length.

Install TextSum:

pip install textsum

Then use it in Python with this model:

from textsum.summarize import Summarizer

model_name = "pszemraj/led-large-book-summary"
summarizer = Summarizer(
    model_name_or_path=model_name,  # you can use any Seq2Seq model on the Hub
    token_batch_length=4096,  # tokens to batch summarize at a time, up to 16384
)
long_string = "This is a long string of text that will be summarized."
out_str = summarizer.summarize_string(long_string)
print(f"summary: {out_str}")

Currently implemented interfaces include a Python API, a Command-Line Interface (CLI), and a demo/web UI.

For detailed explanations and documentation, check the README or the wiki

Related Models

Check out these other related models, also trained on the BookSum dataset:

LED-large continued - experiment with further fine-tuning
Long-T5-tglobal-base
BigBird-Pegasus-Large-K
Pegasus-X-Large
Long-T5-tglobal-XL

There are also other variants on other datasets etc on my hf profile, feel free to try them out :)

Downloads last month: 9,128

Safetensors

Model size

460M params

Tensor type

F32

Inference Examples

Summarization

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/led-large-book-summary

Adapters

1 model

Dataset used to train pszemraj/led-large-book-summary

Spaces using pszemraj/led-large-book-summary 13

Collection including pszemraj/led-large-book-summary

BookSum-based Summarizers

Collection

BookSum-tuned text-to-text summarization models • 7 items • Updated 24 days ago • 3

Evaluation results

ROUGE-1 on kmfoda/booksum
test set verified

31.731
ROUGE-2 on kmfoda/booksum
test set verified

5.331
ROUGE-L on kmfoda/booksum
test set verified

16.146
ROUGE-LSUM on kmfoda/booksum
test set verified

29.088
loss on kmfoda/booksum
test set verified

4.816
gen_len on kmfoda/booksum
test set verified

154.904
ROUGE-1 on samsum
test set verified

33.448
ROUGE-2 on samsum
test set verified

10.425
ROUGE-L on samsum
test set verified

24.580
ROUGE-LSUM on samsum
test set verified

29.823

View on Papers With Code