mrzjy commited on
Commit
05bb979
Β·
verified Β·
1 Parent(s): 87cedaa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -175,7 +175,7 @@ We trained 2 models on the above dataset:
175
  - [NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward](https://huggingface.co/mrzjy/NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward): The PRM for outline generation task, trained by using TRL library ([Refer to Doc](https://huggingface.co/docs/trl/prm_trainer)).
176
  - Note: This model is trained with `train_on_last_step_only` flag set to `True`
177
 
178
- ## 4. Performance Evaluation
179
 
180
  ### 4.1 Accuracy Metric
181
 
@@ -184,13 +184,12 @@ We trained 2 models on the above dataset:
184
  ```
185
  ```
186
 
187
- ### 4.2 LLM Sampling with PRM
188
 
189
- Without delving into further reinforcement learning or policy updates, can we directly apply PRM with our LLMs? The answer is YES!
190
 
191
- #### 4.2.1 Test-Time Scaling
192
 
193
- #### 4.2.2 Sequential Rejection Sampling
194
 
195
  - Case Study
196
 
 
175
  - [NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward](https://huggingface.co/mrzjy/NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward): The PRM for outline generation task, trained by using TRL library ([Refer to Doc](https://huggingface.co/docs/trl/prm_trainer)).
176
  - Note: This model is trained with `train_on_last_step_only` flag set to `True`
177
 
178
+ ## 4. Usage & Performance Evaluation
179
 
180
  ### 4.1 Accuracy Metric
181
 
 
184
  ```
185
  ```
186
 
187
+ ### 4.2 Sequential Rejection Sampling
188
 
189
+ Without delving into further reinforcement learning, can we directly apply PRM with our LLMs? The answer is YES!
190
 
191
+ - Test-Time Scaling
192
 
 
193
 
194
  - Case Study
195