Update README.md
Browse files
README.md
CHANGED
@@ -74,7 +74,7 @@ These datasets were combined to form our bilingual training data.
|
|
74 |
|
75 |
### 2.2 SFT Training Data
|
76 |
|
77 |
-
For each novel, we used Qwen2.5-7B-Instruct to generate outline summaries for each chapter independently. Subsequently, we applied Qwen2.5-32B-Instruct to refine these outlines, ensuring smoother and more natural sequencing.
|
78 |
|
79 |
Additionally, a brief `synopsis` and `characters` are summarized as required for the outline generation tasks.
|
80 |
|
@@ -82,9 +82,9 @@ As a result, we can build an SFT training dataset for LLMs, which also serves as
|
|
82 |
|
83 |
### 2.3 PRM Training Data
|
84 |
|
85 |
-
The training data for the outline-PRM is constructed as follows:
|
86 |
|
87 |
-
We assume that Qwen2.5-7B-generated outlines under such a simple prompt are **ALWAYS** inferior to
|
88 |
|
89 |
Starting from the SFT dataset, we generate rollouts of each outline by providing the same prompt and preceding ground-truth outlines. Each rollout is prompted to consist of similar number of words as the ground-truth. And every rollout is then treated as a negative sample.
|
90 |
|
|
|
74 |
|
75 |
### 2.2 SFT Training Data
|
76 |
|
77 |
+
For each novel, we used Qwen2.5-7B-Instruct to generate outline summaries for each chapter independently. Subsequently, we applied Qwen2.5-32B-Instruct to refine these outlines, ensuring smoother and more natural sequencing. We call it the ground-truth `outline`.
|
78 |
|
79 |
Additionally, a brief `synopsis` and `characters` are summarized as required for the outline generation tasks.
|
80 |
|
|
|
82 |
|
83 |
### 2.3 PRM Training Data
|
84 |
|
85 |
+
The training data for the outline-PRM is basically constructed as follows:
|
86 |
|
87 |
+
We assume that Qwen2.5-7B-generated outlines under such a simple prompt are **ALWAYS** inferior to ground-truth outlines, and can be regarded as **LOW** quality.
|
88 |
|
89 |
Starting from the SFT dataset, we generate rollouts of each outline by providing the same prompt and preceding ground-truth outlines. Each rollout is prompted to consist of similar number of words as the ground-truth. And every rollout is then treated as a negative sample.
|
90 |
|