Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ datasets:
|
|
7 |
This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent.
|
8 |
|
9 |
### Background
|
10 |
-
This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from *human feedback* and *AI feedback*.
|
11 |
|
12 |
**Links**: [RLHF model](https://huggingface.co/samhog/psychology-llama-rlhf); [RLAIF model](https://huggingface.co/samhog/psychology-llama-rlaif)
|
13 |
|
|
|
7 |
This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent.
|
8 |
|
9 |
### Background
|
10 |
+
This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from *human feedback* and *AI feedback*. The paper can be found [here](https://www.diva-portal.org/smash/record.jsf?aq2=%5B%5B%5D%5D&c=2&af=%5B%5D&searchType=SIMPLE&sortOrder2=title_sort_asc&query=rlhf&language=en&pid=diva2%3A1782683&aq=%5B%5B%5D%5D&sf=undergraduate&aqe=%5B%5D&sortOrder=author_sort_asc&onlyFullText=false&noOfRows=50&dswid=6661)!
|
11 |
|
12 |
**Links**: [RLHF model](https://huggingface.co/samhog/psychology-llama-rlhf); [RLAIF model](https://huggingface.co/samhog/psychology-llama-rlaif)
|
13 |
|