YAML Metadata
Error:
"datasets[2]" with value "samsum_(translated_into_Russian)" is not valid. If possible, use a dataset id from https://hf.co/datasets.
📝 Description
MBart for Russian summarization fine-tuned for dialogues summarization.
This model was firstly fine-tuned by Ilya Gusev on Gazeta dataset. We have fine tuned that model on SamSum dataset translated to Russian using GoogleTranslateAPI
🤗 Moreover! We have implemented a ! telegram bot @summarization_bot ! with the inference of this model. Add it to the chat and get summaries instead of dozens spam messages! 🤗
❓ How to use with code
from transformers import MBartTokenizer, MBartForConditionalGeneration
# Download model and tokenizer
model_name = "Kirili4ik/mbart_ruDialogSum"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)
model.eval()
article_text = "..."
input_ids = tokenizer(
[article_text],
max_length=600,
padding="max_length",
truncation=True,
return_tensors="pt",
)["input_ids"]
output_ids = model.generate(
input_ids=input_ids,
top_k=0,
num_beams=3,
no_repeat_ngram_size=3
)[0]
summary = tokenizer.decode(output_ids, skip_special_tokens=True)
print(summary)
- Downloads last month
- 106
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Evaluation results
- Validation ROGUE-1 on SAMSum Corpus (translated to Russian)self-reported34.500
- Validation ROGUE-L on SAMSum Corpus (translated to Russian)self-reported33.000
- Test ROGUE-1 on SAMSum Corpus (translated to Russian)self-reported31.000
- Test ROGUE-L on SAMSum Corpus (translated to Russian)self-reported28.000