mrzjy commited on
Commit
8399cd2
Β·
verified Β·
1 Parent(s): 067e712

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -74,7 +74,7 @@ These datasets were combined to form our bilingual training data.
74
 
75
  ### 2.2 SFT Training Data
76
 
77
- For each novel, we used Qwen2.5-7B-Instruct to generate outline summaries for each chapter independently. Subsequently, we applied Qwen2.5-32B-Instruct to refine these outlines, ensuring smoother and more natural sequencing.
78
 
79
  Additionally, a brief `synopsis` and `characters` are summarized as required for the outline generation tasks.
80
 
@@ -82,9 +82,9 @@ As a result, we can build an SFT training dataset for LLMs, which also serves as
82
 
83
  ### 2.3 PRM Training Data
84
 
85
- The training data for the outline-PRM is constructed as follows:
86
 
87
- We assume that Qwen2.5-7B-generated outlines under such a simple prompt are **ALWAYS** inferior to human-written ones, and can be regarded as **LOW** quality.
88
 
89
  Starting from the SFT dataset, we generate rollouts of each outline by providing the same prompt and preceding ground-truth outlines. Each rollout is prompted to consist of similar number of words as the ground-truth. And every rollout is then treated as a negative sample.
90
 
 
74
 
75
  ### 2.2 SFT Training Data
76
 
77
+ For each novel, we used Qwen2.5-7B-Instruct to generate outline summaries for each chapter independently. Subsequently, we applied Qwen2.5-32B-Instruct to refine these outlines, ensuring smoother and more natural sequencing. We call it the ground-truth `outline`.
78
 
79
  Additionally, a brief `synopsis` and `characters` are summarized as required for the outline generation tasks.
80
 
 
82
 
83
  ### 2.3 PRM Training Data
84
 
85
+ The training data for the outline-PRM is basically constructed as follows:
86
 
87
+ We assume that Qwen2.5-7B-generated outlines under such a simple prompt are **ALWAYS** inferior to ground-truth outlines, and can be regarded as **LOW** quality.
88
 
89
  Starting from the SFT dataset, we generate rollouts of each outline by providing the same prompt and preceding ground-truth outlines. Each rollout is prompted to consist of similar number of words as the ground-truth. And every rollout is then treated as a negative sample.
90