NotXia
/

pubmedbert-bio-ext-summ

pubmedbert-bio-ext-summ

feature-extraction

Model card Files Files and versions Community

pubmedbert-bio-ext-summ / README.md

NotXia's picture

Update README.md

6523d5f verified 9 months ago

|

history blame contribute delete

1.47 kB

	---
	license: apache-2.0
	datasets:
	- allenai/mslr2022
	language:
	- en
	pipeline_tag: summarization
	---

	# PubMedBERT for biomedical extractive summarization

	## Description
	Work done for my [Bachelor's thesis](https://amslaurea.unibo.it/id/eprint/29686).

	[PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext) fine-tuned
	on [MS^2](https://github.com/allenai/mslr-shared-task) for extractive summarization.\
	The model architecture is similar to [BERTSum](https://github.com/nlpyang/BertSum).\
	Training code is available at [biomed-ext-summ](https://github.com/NotXia/biomed-ext-summ).

	## Usage
	```python
	summarizer = pipeline("summarization",
	model = "NotXia/pubmedbert-bio-ext-summ",
	tokenizer = AutoTokenizer.from_pretrained("NotXia/pubmedbert-bio-ext-summ"),
	trust_remote_code = True,
	device = 0
	)

	sentences = ["sent1.", "sent2.", "sent3?"]
	summarizer({"sentences": sentences}, strategy="count", strategy_args=2)
	>>> (['sent1.', 'sent2.'], [0, 1])
	```

	### Strategies
	Strategies to summarize the document:
	- `length`: summary with a maximum length (`strategy_args` is the maximum length).
	- `count`: summary with the given number of sentences (`strategy_args` is the number of sentences).
	- `ratio`: summary proportional to the length of the document (`strategy_args` is the ratio [0, 1]).
	- `threshold`: summary only with sentences with a score higher than a given value (`strategy_args` is the minimum score).