hyunseoki commited on
Commit
9421869
1 Parent(s): d3545ce

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - OpenAssistant/reward-model-deberta-v3-large-v2
6
+ ---
7
+
8
+ ## ReMoDetect: Robust Detection of Large Language Model Generated Texts Using Reward Model
9
+
10
+ ReMoDetect addresses the growing risks of large language model (LLM) usage, such as generating fake news, by improving detection of LLM-generated text (LGT). Unlike detecting individual models, ReMoDetect identifies common traits among LLMs by focusing on alignment training, where LLMs are fine-tuned to generate human-preferred text. Our key finding is that aligned LLMs produce texts with higher estimated preferences than human-written ones, making them detectable using a reward model trained on human preference distribution.
11
+
12
+ In ReMoDetect, we introduce two training strategies to enhance the reward model’s detection performance:
13
+ 1. **Continual preference fine-tuning**, which pushes the reward model to further prefer aligned LGTs.
14
+ 2. **Reward modeling of Human/LLM mixed texts**, where we use rephrased human-written texts as a middle ground between LGTs and human texts to improve detection.
15
+
16
+ This approach achieves state-of-the-art results across several LLMs. For more technical details, check out our [paper](https://arxiv.org/abs/2405.17382).
17
+
18
+ Please check the [official repository](https://github.com/hyunseoklee-ai/ReMoDetect), and [project page](https://github.com/hyunseoklee-ai/ReMoDetect) for more implementation details and updates.
19
+
20
+
21
+ #### How to Use
22
+ ``` python
23
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
24
+
25
+ model_id = "hyunseoki/ReMoDetect-deberta"
26
+ tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir=cache_dir)
27
+ detector = AutoModelForSequenceClassification.from_pretrained(model_id)
28
+
29
+ text = 'This text was written by a person.'
30
+ inputs = tokenizer(text, return_tensors='pt', truncation=True,max_length=512, padding=True)
31
+
32
+ score = detector(**inputs).logits[0]
33
+ print(score)
34
+
35
+ ```
36
+
37
+ ### Citation
38
+
39
+ If you find ReMoDetect-deberta useful for your work, please cite the following papers:
40
+
41
+ ``` latex
42
+ @misc{lee2024remodetect,
43
+ title={ReMoDetect: Reward Models Recognize Aligned LLM's Generations},
44
+ author={Hyunseok Lee and Jihoon Tack and Jinwoo Shin},
45
+ year={2024},
46
+ eprint={2405.17382},
47
+ archivePrefix={arXiv},
48
+ primaryClass={cs.LG},
49
+ url={https://arxiv.org/abs/2405.17382},
50
+ }
51
+ ```