Text Classification
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints
hendrydong commited on
Commit
06f806b
1 Parent(s): 1f4491d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,3 +1,6 @@
 
 
 
1
 
2
  This reward function can be used for RLHF, including PPO, iterative SFT, iterative DPO.
3
 
@@ -73,5 +76,4 @@ The repo was part of the iterative rejection sampling fine-tuning and iterative
73
  archivePrefix={arXiv},
74
  primaryClass={cs.LG}
75
  }
76
- ```
77
-
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ ---
4
 
5
  This reward function can be used for RLHF, including PPO, iterative SFT, iterative DPO.
6
 
 
76
  archivePrefix={arXiv},
77
  primaryClass={cs.LG}
78
  }
79
+ ```