mrzjy
/

NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward

Token Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mrzjy commited on 3 days ago

Commit

05bb979

·

verified ·

1 Parent(s): 87cedaa

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -175,7 +175,7 @@ We trained 2 models on the above dataset:
 - [NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward](https://huggingface.co/mrzjy/NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward): The PRM for outline generation task, trained by using TRL library ([Refer to Doc](https://huggingface.co/docs/trl/prm_trainer)).
   - Note: This model is trained with `train_on_last_step_only` flag set to `True`
-## 4. Performance Evaluation
 ### 4.1 Accuracy Metric
@@ -184,13 +184,12 @@ We trained 2 models on the above dataset:
 ```
 ```
-### 4.2 LLM Sampling with PRM
-Without delving into further reinforcement learning or policy updates, can we directly apply PRM with our LLMs? The answer is YES!
-#### 4.2.1 Test-Time Scaling
-#### 4.2.2 Sequential Rejection Sampling
 - Case Study

 - [NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward](https://huggingface.co/mrzjy/NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward): The PRM for outline generation task, trained by using TRL library ([Refer to Doc](https://huggingface.co/docs/trl/prm_trainer)).
   - Note: This model is trained with `train_on_last_step_only` flag set to `True`
+## 4. Usage & Performance Evaluation
 ### 4.1 Accuracy Metric
 ```
 ```
+### 4.2 Sequential Rejection Sampling
+Without delving into further reinforcement learning, can we directly apply PRM with our LLMs? The answer is YES!
+- Test-Time Scaling
 - Case Study