Text Classification
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints
hendrydong commited on
Commit
1f4491d
1 Parent(s): a82a31c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -7
README.md CHANGED
@@ -52,14 +52,7 @@ This Reward model is the SOTA open-source RM (Apr 20, 2024) on Reward-Bench.
52
  | Safety | 88.76 |
53
  | Reasoning | 88.3 |
54
 
55
- ## See also
56
 
57
- You can also refer to our short blog for RM training details: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0.
58
-
59
-
60
- ## Contact
61
-
62
- Please contact hanze.dong AT salesforce.com if you have any questions.
63
 
64
  ## References
65
  The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows:
 
52
  | Safety | 88.76 |
53
  | Reasoning | 88.3 |
54
 
 
55
 
 
 
 
 
 
 
56
 
57
  ## References
58
  The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows: