Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,8 @@ The base model [braindao/flan-t5-cnn](https://huggingface.co/braindao/flan-t5-cn
|
|
37 |
## Model description
|
38 |
|
39 |
* This model further finetuned [braindao/flan-t5-cnn](https://huggingface.co/braindao/flan-t5-cnn) on the more conversational samsum dataset.
|
40 |
-
* Huggingface [PEFT Library](https://github.com/huggingface/peft) LoRA (r = 16) was used to
|
|
|
41 |
* The model checkpoint is just 7MB.
|
42 |
|
43 |
## Intended uses & limitations
|
@@ -61,6 +62,36 @@ The following hyperparameters were used during training:
|
|
61 |
- rougeL: 37.300937%
|
62 |
- rougeLsum: 37.271341%
|
63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
### Framework versions
|
65 |
|
66 |
- Transformers 4.27.2
|
|
|
37 |
## Model description
|
38 |
|
39 |
* This model further finetuned [braindao/flan-t5-cnn](https://huggingface.co/braindao/flan-t5-cnn) on the more conversational samsum dataset.
|
40 |
+
* Huggingface [PEFT Library](https://github.com/huggingface/peft) LoRA (r = 16) and bitsandbytes int-8 was used to speed up training and reduce the model size.
|
41 |
+
* Only 1.7M parameters were trained (0.71% of original flan-t5-base 250M parameters).
|
42 |
* The model checkpoint is just 7MB.
|
43 |
|
44 |
## Intended uses & limitations
|
|
|
62 |
- rougeL: 37.300937%
|
63 |
- rougeLsum: 37.271341%
|
64 |
|
65 |
+
### How to use
|
66 |
+
|
67 |
+
```python
|
68 |
+
import torch
|
69 |
+
from peft import PeftModel, PeftConfig
|
70 |
+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
71 |
+
|
72 |
+
# Load peft config for pre-trained checkpoint etc.
|
73 |
+
peft_model_id = "sooolee/flan-t5-base-cnn-samsum-lora"
|
74 |
+
config = PeftConfig.from_pretrained(peft_model_id)
|
75 |
+
|
76 |
+
# load base LLM model and tokenizer
|
77 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map='auto')
|
78 |
+
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
79 |
+
|
80 |
+
# Load the Lora model
|
81 |
+
model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
|
82 |
+
|
83 |
+
# Tokenize the text inputs
|
84 |
+
texts = "<e.g. Part of YouTube Transcript>"
|
85 |
+
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
|
86 |
+
|
87 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
88 |
+
with torch.no_grad():
|
89 |
+
output = self.model.generate(input_ids=inputs["input_ids"].to(device), max_new_tokens=60, do_sample=True, top_p=0.9)
|
90 |
+
summary = self.tokenizer.batch_decode(output.detach().cpu().numpy(), skip_special_tokens=True)
|
91 |
+
|
92 |
+
summary
|
93 |
+
```
|
94 |
+
|
95 |
### Framework versions
|
96 |
|
97 |
- Transformers 4.27.2
|