license: apache-2.0
language:
- zh
- en
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
pipeline_tag: token-classification
library_name: transformers
tags:
- novel-writing
- PRM
- outline
PRM for Simplistic Novel Outline Generation
This is a small project driven by personal interest, focused on developing a Process-Level Reward Model (PRM) for a specific task: generating outlines for novels.
The aim is to explore how PRMs can provide quality signals for the process of structured outline creation.
1. Task Definition
1.1 Novel Outline Generation
In practice, creating a novel outline typically involves a far more complex reflective process.
However, for the purposes of this experiment, the task is simplified as follows:
- Given a
story idea
andcharacter designs
, generateoutlines
for the firstn
chapters (n
can range from 1 to 10, as for the construction of the training data).
Below is a system prompt template used for training data construction:
- English
Act as a novel writer. Your task is to craft novel outlines based on the following story idea:
{story_idea}
Here's your character design:
{character}
Please create the story outlines for the first {n} chapters, with each chapter outline in {n_word} words.
- Chinese
你是一位专业小说写手。你的任务是基于以下故事灵感进行小说大纲创作。
{story_idea}
以下是你设计的角色:
{character}
请基于以上信息为小说前{n}章设计故事大纲,每个大纲大概在{n_word}字左右。
1.2 PRM Definition
A PRM is designed to provide process-level reward signals for generation tasks. In this context, each process or step refers specifically to a one-line outline representing a single chapter of a novel.
(image white background here)
2. Training Data
2.1 Preparation
We collected data from two sources:
- 番茄小说 (Chinese dataset): ~1k novels, limited to the first several chapters.
- GoodNovel (English dataset): ~3k novels, limited to the first several chapters.
These datasets were combined to form our bilingual training data.
2.2 SFT Training Data
For each novel, we used Qwen2.5-7B-Instruct to generate outline summaries for each chapter independently. Subsequently, we applied Qwen2.5-32B-Instruct to refine these outlines, ensuring smoother and more natural sequencing. We call it the ground-truth outline
.
Additionally, a brief synopsis
and characters
are summarized as required for the outline generation tasks.
As a result, we can build an SFT training dataset for LLMs, which also serves as the foundation for creating the PRM training dataset.
2.3 PRM Training Data
The training data for the outline-PRM is basically constructed as follows:
We assume that Qwen2.5-7B-generated outlines under such a simple prompt are ALWAYS inferior to ground-truth outlines, and can be regarded as LOW quality.
Starting from the SFT dataset, we generate rollouts of each outline by providing the same prompt and preceding ground-truth outlines. Each rollout is prompted to consist of similar number of words as the ground-truth. And every rollout is then treated as a negative sample.
This approach ensures a balanced distribution of positive and negative labels.
- Example negative samples:
{
"prompt":"你是一位专业小说写手。你的任务是基于以下故事灵感进行小说大纲创作。\n一个关于陈玄通过直播算命解决他人问题,展示超凡堪舆技能的故事。\n\n以下是你设计的角色:\n:\n\n角色1:陈玄,主角,一个18岁的道士,穿越后继承道观,拥有登峰造极的堪舆算命技能,通过直播算命帮助他人逆天改命。\n\n角色2:龙夏,配角,一个中年男子,直播间粉丝之一,对陈玄算命持怀疑态度,后被家人出事逼迫承认错误。\n\n角色3:汤巫山,配角,陈玄的道观所在之地,位于豫中,是陈玄施展法术的地方。\n\n角色4:飞仙观,配角,陈玄继承的道观,位于汤巫山,是陈玄进行算命活动的场所。\n\n角色5:九家军,配角,一个户外探险直播团队,成员包括“九家军神马东西”、“九家军先锋”等,因直播算命事件受到关注。\n\n角色6:阴九,配角,九家军成员之一,因直播间算命事件被家人出事逼迫承认错误,最终求助陈玄。\n\n角色7:杨晨,配角,一个高三学生,因梦遗问题求助陈玄,最终被诊断为中蛊,通过陈玄的帮助解决了问题。\n\n角色8:王浩,配角,杨晨的室友,因嫉妒杨晨而下蛊,最终被杨晨发现并处理。\n\n角色9:黄圆圆,配角,一个自称来自山河五仙的老奶奶,因杨晨的问题而威胁陈玄,最终被陈玄化解。\n\n角色10:宿管大叔,配角,杨晨所在宿舍楼的管理员,因杨晨的问题而参与处理,最终协助杨晨解决困扰。\n\n\n请基于以上信息为小说前9章设计故事大纲,每个大纲大概在75字左右。",
"completions":[
"第1章:陈玄意外穿越至另一个时空,继承了一座古旧道观,并发现自己拥有堪舆算命的超凡技能。他在直播算命时遭遇了一系列质疑,但当他展示付款码后,人气迅速飙升,引起了包括九家军在内的众多观众的注意。",
"第2章:在直播中,陈玄为阴九进行了算命。起初,他的预测遭到嘲笑和质疑。陈玄直接指出阴九将有大凶之兆,这一预言引发了激烈的争议。最终,陈玄凭借问及阴九奶奶的具体去世时间和病因,揭露了事实,证明了自己算命的准确性。",
"第3章:阴九在直播中质问陈玄,得知家人的安危后,他慌乱中不断打赏。陈玄揭示了阴九家人的凶兆,阴九接到妻子的电话,得知女儿病危。在绝望中,他跪地求救,但陈玄只能告知他已经发生,并警告阴九。",
"第4章:陈玄通过直播为阴九解难,揭示其面临的危机源自何处,并指导其进行一系列仪式以渡过劫难。直播间观众因为参与怂恿而面临巨大的阴德损失威胁。",
"第5章:陈玄通过直播算命帮助阴九等人解决了问题,面对大量粉丝的求助请求,他通过发放福袋的方式筛选出第一位求助者杨晨。陈玄成功诊断其为中蛊状态,并提供了解决方案。",
"第6章:杨晨在直播中向陈玄求助,陈玄通过直播为他解梦。杨晨因梦遗问题求助陈玄,陈玄发现他中蛊,通过直播向观众展示了解决方案,最终成功治愈杨晨。"
],
"labels":[
true,
true,
true,
true,
true,
false
]
}
{
"prompt":"Act as a novel writer. Your task is to craft novel outlines based on the following story idea:\nA story about Lily, who discovers her fiancé Nathaniel's affair, then gets unexpectedly married to Alexander Russell for a business deal. She confronts Nathaniel and Melanie's scandal at a perfume competition, proving her own creation and facing betrayal. The wedding night offers a moment of vulnerability and connection, but their relationship remains complex.\n\nHere's your character design:\nLily Christian, protagonist, a determined and skilled perfumer who faces betrayal and competition to reclaim her rightful place.\nNathaniel Hall, antagonist, a selfish businessman who uses and betrays Lily, only to find his plans backfiring.\nMelanie Thayer, antagonist, Nathaniel's unfaithful lover who steals Lily's work, unaware of the consequences.\nAlexander Russell, protagonist, a savvy businessman who marries Lily to help her and takes on MN Inc., though his motives are complex.\nAnthony Moore, supporting, Nathaniel's loyal secretary who helps hide the truth about Lily's whereabouts.\nOlivia Hart, supporting, Lily's assistant who supports her and helps orchestrate the plan to expose Nathaniel and Melanie.\n\nPlease create the story outlines for the first 10 chapters, with each chapter outline in 80 words",
"completions":[
"Chapter 1: Lily Christian, battling a headache and insomnia, overhears Nathaniel and his fiancé Melanie discussing an affair. Devastated, she recalls their years together and Nathaniel's betrayal through his business partnership with Melanie. Seeking validation, Lily receives an unexpected call from Alexander Russell, CEO of La Beauté Group, offering a meeting to discuss a business proposal. Rushing to the café, she boards a limousine, unsure of Russell's intentions.",
"Chapter 2: Lily arrives at the clerk's office, expecting to discuss a business proposal with Alexander Russell. Instead, he proposes marriage. Reluctantly, she accepts, and they quickly get married. Alexander directs her to pass on perfume information to Edward and schedules a meeting at La Beauté Group. Back at MN Inc., Lily encounters Nathaniel's secretary, Anthony, who informs her that Nathaniel is looking for her. In Nathaniel's office, she overhears his angry outburst at his assistant, Olivia, for not knowing her whereabouts.",
"Chapter 3: At MN Inc., Nathaniel and Melanie are agitated over missing documents. Nathaniel accuses Lily of being absent from the lab, but she explains she was preparing for a competition. Melanie reveals Lily's past reluctance to participate in such events. Nathaniel checks the documents in a bag Lily holds, and they discuss the upcoming talent competition. Nathaniel insists Lily won't participate, but Lily feels betrayed. She calls Olivia, her assistant, who reports MN Inc. is well-prepared. At La Beauté Group, Edward briefs her on the situation. Alexander notices Lily's injury and lifts her, eliciting a mix of concern and tension.",
"Chapter 4: Alexander tends to Lily's wound, showing a new level of care she has never seen from Nathaniel. At La Beauté Group, Lily watches Nathaniel and Melanie's confident performance, feeling a mix of resentment and determination. During the competition, the host reveals a scandal involving identical perfumes from MN Inc. and Rebirth, potentially implicating MN Inc. in plagiarism. Lily's resolve hardens as she realizes her past work could be jeopardized.",
"Chapter 5: The competition host announces a delay in awarding results due to identical perfumes submitted by MN Inc. and Rebirth. Nathaniel protests the postponement, while Melanie eagerly speculates about the other company. The host reveals both companies are suspected of plagiarism, and Rebirth’s representative confirms submission data. Nathaniel asserts that Mel is the sole creator of First Love, but the host asks Rebirth’s perfumer to step forward, undermining Nathaniel’s claim. Alexander watches, impressed by Lily’s growing confidence and determination.",
"Chapter 6: At MN Inc., Nathaniel confronts Melanie about the competition scandal. Melanie insists she’s the creator of First Love, but Nathaniel remains skeptical. Anthony, Nathaniel’s loyal secretary, tries to smooth things over, but Nathaniel’s anger grows. Lily returns to MN Inc., finding the lab in chaos. She discovers a note from Nathaniel, indicating he knows about her affair. Nathaniel accuses her of betrayal, but Lily denies any wrongdoing, insisting she’s focused on the competition. Alexander arrives, offering support and reassurance. Lily feels a mix of vulnerability and determination."
],
"labels":[
true,
true,
true,
true,
true,
false
]
}
Note: Use the train_on_last_step_only
flag to ensure to train on balanced positive and negative labels.
3. Model Training
We trained 2 models on the above dataset:
- NovelWriting-Outline-Qwen2.5-7B-Instruct: The SFT LLM, trained by Llama-Factory.
- NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward: The PRM for outline generation task, trained by using TRL library (Refer to Doc).
- Note: This model is trained with
train_on_last_step_only
flag set toTrue
- Note: This model is trained with
4. Performance Evaluation
4.1 Accuracy Metric
- Case Study
4.2 LLM Sampling with PRM
Without delving into further reinforcement learning or policy updates, can we directly apply PRM with our LLMs? The answer is YES!
4.2.1 Test-Time Scaling
4.2.2 Sequential Rejection Sampling
- Case Study