NotXia
/

pubmedbert-bio-ext-summ

pubmedbert-bio-ext-summ

feature-extraction

Model card Files Files and versions Community

NotXia commited on Jul 31, 2023

Commit

f71a65f

•

1 Parent(s): 04f3f18

Update README.md

Files changed (1) hide show

README.md +34 -0

README.md CHANGED Viewed

@@ -1,3 +1,37 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+datasets:
+- allenai/mslr2022
+language:
+- en
+pipeline_tag: summarization
 ---
+# PubMedBERT for biomedical extractive summarization
+## Description
+[PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext) fine-tuned
+on [MS^2](https://github.com/allenai/mslr-shared-task) for extractive summarization.\
+Model architecture is similar to [BERTSum](https://github.com/nlpyang/BertSum).\
+Training code is available at [biomed-ext-summ](https://github.com/NotXia/biomed-ext-summ).
+## Usage
+```python
+summarizer = pipeline("summarization",
+  model = "NotXia/pubmedbert-bio-ext-summ",
+  tokenizer = AutoTokenizer.from_pretrained("NotXia/pubmedbert-bio-ext-summ"),
+  trust_remote_code = True,
+  device = 0
+)
+sentences = ["sent1.", "sent2.", "sent3?"]
+summarizer({"sentences": sentences}, strategy="count", strategy_args=2)
+>>> (['sent1.', 'sent2.'], [0, 1])
+```
+### Strategies
+Strategies to summarize the document:
+- `length`: summary with a maximum length (`strategy_args` is the maximum length).
+- `count`: summary with the given number of sentences (`strategy_args` is the number of sentences).
+- `ratio`: summary proportional to the length of the document (`strategy_args` is the ratio [0, 1]).
+- `threshold`: summary only with sentences with a score higher than a given value (`strategy_args` is the minimum score).