Text Generation
Transformers
Safetensors
Chinese
English
qwen
conversational
custom_code
yuyijiong commited on
Commit
9ebae4d
·
1 Parent(s): 3645429

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +2 -2
  2. README_en.md +3 -2
README.md CHANGED
@@ -11,7 +11,7 @@ pipeline_tag: text-generation
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
14
- * 2023.12.16更新:发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)
15
  * 2023.12.14更新:发布经过微调的Qwen-14b-chat-yarn-32k,微调后的模型能适应32k长度(约4万汉字)的中英问答,相较于之前的通过位置插值得到的32k模型,几乎完全解决了多文档问答任务下召回率低(即 lost in middle 现象)的问题。
16
  <br>
17
  <br>
@@ -58,7 +58,7 @@ print(response)
58
  ### 3.指令微调
59
  * 使用[yuyijiong/Long-Instruction-Chinese](https://huggingface.co/datasets/yuyijiong/Long-Instruction-Chinese)数据,Qlora方法,对Qwen模型进行微调。
60
 
61
- * 更多训练细节会在未来的技术报告中介绍。
62
 
63
  <br>
64
 
 
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
14
+ * 2023.12.16更新:发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)、[论文(英文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)
15
  * 2023.12.14更新:发布经过微调的Qwen-14b-chat-yarn-32k,微调后的模型能适应32k长度(约4万汉字)的中英问答,相较于之前的通过位置插值得到的32k模型,几乎完全解决了多文档问答任务下召回率低(即 lost in middle 现象)的问题。
16
  <br>
17
  <br>
 
58
  ### 3.指令微调
59
  * 使用[yuyijiong/Long-Instruction-Chinese](https://huggingface.co/datasets/yuyijiong/Long-Instruction-Chinese)数据,Qlora方法,对Qwen模型进行微调。
60
 
61
+ * 更多训练细节在论文中介绍。
62
 
63
  <br>
64
 
README_en.md CHANGED
@@ -11,7 +11,7 @@ pipeline_tag: text-generation
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
14
- * Updated on December 16, 2023: Release [Paper (Chinese)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)
15
  * Updated on December 14, 2023: We have released the Qwen-14b-chat-yarn-32k model, which has been fine-tuned to handle Chinese and English question-answering tasks with a length of up to 32k (approximately 40,000 Chinese characters). This model addresses the low recall issue in multi-document question-answering tasks (also known as the "lost in middle" phenomenon) that was present in the previous 32k model obtained through position interpolation. <br>
16
  <br>
17
  # Evaluation results in LongBench
@@ -26,6 +26,7 @@ pipeline_tag: text-generation
26
  | Qwen-14b-chat-32k-lora | 0.34 |
27
  | LongAlpaca-7b-32k-chinese-v2 | 0.12 |
28
  | CausalLM-14b | 0.086 |
 
29
  Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.
30
  <br>
31
 
@@ -56,7 +57,7 @@ During training, use_dynamic_ntk was set to True.
56
  ### 3.Instructional Fine-tuning
57
  * Using [yuyijiong/Long-Instruction-Chinese](https://huggingface.co/datasets/yuyijiong/Long-Instruction-Chinese) dataset and the Qlora method to fine-tune the Qwen model. The model was trained on a single A800 GPU for 5 days.
58
 
59
- * More training details will be presented in our future technical report.
60
 
61
  <br>
62
 
 
11
  ---
12
  **Read this in other languages: [English](README_en.md), [中文](README.md).**
13
 
14
+ * Updated on December 16, 2023: Release [Paper](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)
15
  * Updated on December 14, 2023: We have released the Qwen-14b-chat-yarn-32k model, which has been fine-tuned to handle Chinese and English question-answering tasks with a length of up to 32k (approximately 40,000 Chinese characters). This model addresses the low recall issue in multi-document question-answering tasks (also known as the "lost in middle" phenomenon) that was present in the previous 32k model obtained through position interpolation. <br>
16
  <br>
17
  # Evaluation results in LongBench
 
26
  | Qwen-14b-chat-32k-lora | 0.34 |
27
  | LongAlpaca-7b-32k-chinese-v2 | 0.12 |
28
  | CausalLM-14b | 0.086 |
29
+
30
  Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.
31
  <br>
32
 
 
57
  ### 3.Instructional Fine-tuning
58
  * Using [yuyijiong/Long-Instruction-Chinese](https://huggingface.co/datasets/yuyijiong/Long-Instruction-Chinese) dataset and the Qlora method to fine-tune the Qwen model. The model was trained on a single A800 GPU for 5 days.
59
 
60
+ * More training details are presented in our paper.
61
 
62
  <br>
63