Update README.md
Browse files
README.md
CHANGED
@@ -6,8 +6,31 @@ datasets:
|
|
6 |
# Psychology Alpaca π©
|
7 |
This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent.
|
8 |
|
9 |
-
### Background
|
10 |
-
This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from *human feedback* and *AI feedback*.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
**Links**: [RLHF model](https://huggingface.co/samhog/psychology-llama-rlhf); [RLAIF model](https://huggingface.co/samhog/psychology-llama-rlaif)
|
13 |
|
|
|
6 |
# Psychology Alpaca π©
|
7 |
This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent.
|
8 |
|
9 |
+
### Background π‘
|
10 |
+
This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from *human feedback* and *AI feedback*.
|
11 |
+
|
12 |
+
### Paper π
|
13 |
+
"Comparison Between RLHF and RLAIF in Fine-Tuning a Large Language Model"
|
14 |
+
|
15 |
+
The paper can be found [here](https://www.diva-portal.org/smash/record.jsf?dswid=3835&pid=diva2%3A1782683&c=2&searchType=SIMPLE&language=en&query=rlhf&af=%5B%5D&aq=%5B%5B%5D%5D&aq2=%5B%5B%5D%5D&aqe=%5B%5D&noOfRows=50&sortOrder=author_sort_asc&sortOrder2=title_sort_asc&onlyFullText=false&sf=undergraduate)!
|
16 |
+
|
17 |
+
### Usage π
|
18 |
+
```
|
19 |
+
from peft import PeftModel
|
20 |
+
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig
|
21 |
+
|
22 |
+
tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")
|
23 |
+
|
24 |
+
# Load model weights
|
25 |
+
model = LLaMAForCausalLM.from_pretrained(
|
26 |
+
"decapoda-research/llama-7b-hf",
|
27 |
+
load_in_8bit=True,
|
28 |
+
device_map="auto",
|
29 |
+
)
|
30 |
+
|
31 |
+
# Add Peft layer to initial weights in order to get the Psychology Alpaca weights
|
32 |
+
model = PeftModel.from_pretrained(model, "kth/psychology-alpaca")
|
33 |
+
```
|
34 |
|
35 |
**Links**: [RLHF model](https://huggingface.co/samhog/psychology-llama-rlhf); [RLAIF model](https://huggingface.co/samhog/psychology-llama-rlaif)
|
36 |
|