Update README.md
Browse files
README.md
CHANGED
@@ -175,7 +175,7 @@ We trained 2 models on the above dataset:
|
|
175 |
- [NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward](https://huggingface.co/mrzjy/NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward): The PRM for outline generation task, trained by using TRL library ([Refer to Doc](https://huggingface.co/docs/trl/prm_trainer)).
|
176 |
- Note: This model is trained with `train_on_last_step_only` flag set to `True`
|
177 |
|
178 |
-
## 4. Performance Evaluation
|
179 |
|
180 |
### 4.1 Accuracy Metric
|
181 |
|
@@ -184,13 +184,12 @@ We trained 2 models on the above dataset:
|
|
184 |
```
|
185 |
```
|
186 |
|
187 |
-
### 4.2
|
188 |
|
189 |
-
Without delving into further reinforcement learning
|
190 |
|
191 |
-
|
192 |
|
193 |
-
#### 4.2.2 Sequential Rejection Sampling
|
194 |
|
195 |
- Case Study
|
196 |
|
|
|
175 |
- [NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward](https://huggingface.co/mrzjy/NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward): The PRM for outline generation task, trained by using TRL library ([Refer to Doc](https://huggingface.co/docs/trl/prm_trainer)).
|
176 |
- Note: This model is trained with `train_on_last_step_only` flag set to `True`
|
177 |
|
178 |
+
## 4. Usage & Performance Evaluation
|
179 |
|
180 |
### 4.1 Accuracy Metric
|
181 |
|
|
|
184 |
```
|
185 |
```
|
186 |
|
187 |
+
### 4.2 Sequential Rejection Sampling
|
188 |
|
189 |
+
Without delving into further reinforcement learning, can we directly apply PRM with our LLMs? The answer is YES!
|
190 |
|
191 |
+
- Test-Time Scaling
|
192 |
|
|
|
193 |
|
194 |
- Case Study
|
195 |
|