tdelic commited on
Commit
5070bea
1 Parent(s): d6258ee

Delete README.md.md

Browse files
Files changed (1) hide show
  1. README.md.md +0 -123
README.md.md DELETED
@@ -1,123 +0,0 @@
1
-
2
- # Fine-tuned Mistral Model for Multi-Document Summarization
3
- This model a fine-tuned model based on [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on
4
- [multi_x_science_sum](https://huggingface.co/datasets/multi_x_science_sum) dataset.
5
-
6
- ## Model description
7
-
8
- Mistral-7B-multixscience-finetuned is finetuned on multi_x_science_sum
9
- dataset in order to extend the capabilities of the original
10
- Mistral model in multi-document summarization tasks.
11
- The fine-tuned model leverages the power of Mistral model fundation,
12
- adapting it to synthesize and summarize information from
13
- multiple documents efficiently.
14
-
15
- ## Training and evaluation dataset
16
-
17
- Multi_x_science_sum is a large-scale multi-document
18
- summarization dataset created from scientific articles.
19
- Multi-XScience introduces a challenging multi-document
20
- summarization task: writing the related-work section of a
21
- paper based on its abstract and the articles it references.
22
-
23
- * [paper](https://arxiv.org/pdf/2010.14235.pdf)
24
- * [Source](https://huggingface.co/datasets/multi_x_science_sum)
25
-
26
- The training and evaluation datasets were uniquely generated
27
- to facilitate the fine-tuning of the model for
28
- multi-document summarization, particularly focusing on
29
- generating related work sections for scientific papers.
30
- Using a custom-designed prompt-generation process, the dataset
31
- is created to simulate the task of synthesizing related work
32
- sections based on a given paper's abstract and the abstracts
33
- of its referenced papers.
34
-
35
- ### Dataset Generation process
36
-
37
- The process involves generating prompts that instruct the
38
- model to use the abstract of the current paper along with
39
- the abstracts of cited papers to generate a new related work
40
- section. This approach aims to mimic the real-world scenario
41
- where a researcher synthesizes information from multiple
42
- sources to draft the related work section of a paper.
43
-
44
- * **Prompt Structure:** Each data point consists of an instructional prompt that includes:
45
-
46
- * The abstract of the current paper.
47
- * Abstracts from cited papers, labeled with unique identifiers.
48
- * An expected model response in the form of a generated related work section.
49
-
50
- ### Prompt generation Code
51
-
52
- ```
53
- def generate_related_work_prompt(data):
54
- prompt = "[INST] <<SYS>>\n"
55
- prompt += "Use the abstract of the current paper and the abstracts of the cited papers to generate new related work.\n"
56
- prompt += "<</SYS>>\n\n"
57
- prompt += "Input:\nCurrent Paper's Abstract:\n- {}\n\n".format(data['abstract'])
58
- prompt += "Cited Papers' Abstracts:\n"
59
- for cite_id, cite_abstract in zip(data['ref_abstract']['cite_N'], data['ref_abstract']['abstract']):
60
- prompt += "- {}: {}\n".format(cite_id, cite_abstract)
61
- prompt += "\n[/INST]\n\nGenerated Related Work:\n{}\n".format(data['related_work'])
62
- return {"text": prompt}
63
- ```
64
- The dataset generated through this process was used to train
65
- and evaluate the finetuned model, ensuring that it learns to
66
- accurately synthesize information from multiple sources into
67
- cohesive summaries.
68
-
69
- ## Training hyperparameters
70
-
71
- The following hyperparameters were used during training:
72
- ```
73
- learning_rate: 2e-5
74
- train_batch_size: 4
75
- eval_batch_size: 4
76
- seed: 42
77
- optimizer: adamw_8bit
78
- num_epochs: 5
79
- ```
80
- ## Usage
81
-
82
- ```
83
- import torch
84
- from transformers import AutoModelForCausalLM, AutoTokenizer
85
- from peft import PeftConfig, PeftModel
86
-
87
- base_model = "mistralai/Mistral-7B-v0.1"
88
- adapter = "OctaSpace/Mistral7B-fintuned"
89
-
90
- # Load tokenizer
91
- tokenizer = AutoTokenizer.from_pretrained(
92
- base_model,
93
- add_bos_token=True,
94
- trust_remote_code=True,
95
- padding_side='left'
96
- )
97
-
98
- # Create peft model using base_model and finetuned adapter
99
- config = PeftConfig.from_pretrained(adapter)
100
- model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
101
- load_in_4bit=True,
102
- device_map='auto',
103
- torch_dtype='auto')
104
- model = PeftModel.from_pretrained(model, adapter)
105
-
106
- device = "cuda" if torch.cuda.is_available() else "cpu"
107
- model.to(device)
108
- model.eval()
109
-
110
- # Prompt content:
111
- messages = [] # Put here your related work generation instruction
112
-
113
- input_ids = tokenizer.apply_chat_template(conversation=messages,
114
- tokenize=True,
115
- add_generation_prompt=True,
116
- return_tensors='pt').to(device)
117
- summary_ids = model.generate(input_ids=input_ids, max_new_tokens=512, do_sample=True, pad_token_id=2)
118
- summaries = tokenizer.batch_decode(summary_ids.detach().cpu().numpy(), skip_special_tokens = True)
119
-
120
- # Model response:
121
- print(summaries[0])
122
-
123
- ```