NotXia commited on
Commit
a3a81b7
1 Parent(s): 6f408cf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -1,3 +1,36 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - allenai/mslr2022
5
+ language:
6
+ - en
7
+ pipeline_tag: summarization
8
  ---
9
+
10
+ # Longformer for biomedical extractive summarization
11
+
12
+ ## Description
13
+ Longformer fine-tuned on [MS^2](https://github.com/allenai/mslr-shared-task) for extractive summarization.\
14
+ Model architecture is similar to [BERTSum](https://arxiv.org/abs/1908.08345).\
15
+ Training code is available at [biomed-ext-summ](https://github.com/NotXia/biomed-ext-summ).
16
+
17
+ ## Usage
18
+ ```python
19
+ summarizer = pipeline("summarization",
20
+ model = "NotXia/longformer-bio-ext-summ",
21
+ tokenizer = AutoTokenizer.from_pretrained("NotXia/longformer-bio-ext-summ"),
22
+ trust_remote_code = True,
23
+ device = 0
24
+ )
25
+
26
+ sentences = ["sent1.", "sent2.", "sent3?"]
27
+ summarizer({"sentences": sentences}, strategy="count", strategy_args=2)
28
+ >>> (['sent1.', 'sent2.'], [0, 1])
29
+ ```
30
+
31
+ ### Strategies
32
+ Strategies to summarize the document:
33
+ - `length`: summary with a maximum length (`strategy_args` is the maximum length).
34
+ - `count`: summary with the given number of sentences (`strategy_args` is the number of sentences).
35
+ - `ratio`: summary proportional to the length of the document (`strategy_args` is the ratio [0, 1]).
36
+ - `threshold`: summary only with sentences with a score higher than a given value (`strategy_args` is the minimum score).