File size: 1,288 Bytes
2bde500 a3a81b7 2bde500 a3a81b7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
license: apache-2.0
datasets:
- allenai/mslr2022
language:
- en
pipeline_tag: summarization
---
# Longformer for biomedical extractive summarization
## Description
Longformer fine-tuned on [MS^2](https://github.com/allenai/mslr-shared-task) for extractive summarization.\
Model architecture is similar to [BERTSum](https://arxiv.org/abs/1908.08345).\
Training code is available at [biomed-ext-summ](https://github.com/NotXia/biomed-ext-summ).
## Usage
```python
summarizer = pipeline("summarization",
model = "NotXia/longformer-bio-ext-summ",
tokenizer = AutoTokenizer.from_pretrained("NotXia/longformer-bio-ext-summ"),
trust_remote_code = True,
device = 0
)
sentences = ["sent1.", "sent2.", "sent3?"]
summarizer({"sentences": sentences}, strategy="count", strategy_args=2)
>>> (['sent1.', 'sent2.'], [0, 1])
```
### Strategies
Strategies to summarize the document:
- `length`: summary with a maximum length (`strategy_args` is the maximum length).
- `count`: summary with the given number of sentences (`strategy_args` is the number of sentences).
- `ratio`: summary proportional to the length of the document (`strategy_args` is the ratio [0, 1]).
- `threshold`: summary only with sentences with a score higher than a given value (`strategy_args` is the minimum score). |