yuyijiong
/

Qwen-14b-chat-yarn-32k

@@ -11,6 +11,7 @@ pipeline_tag: text-generation
 ---
 **Read this in other languages: [English](README_en.md), [中文](README.md).**
 * 2023.12.28更新：发布Qwen-7b-chat-yarn-32k，但注意，可能由于模型规模偏小，基座模型能力弱，导致7b版本显著弱于Qwen-14b-chat-yarn-32k
 * 2023.12.23更新：发布LongBench的passage_retrieval_en的评测结果
 * 2023.12.16更新：发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)、[论文(英文版)](https://arxiv.org/abs/2312.11193)
@@ -19,6 +20,11 @@ pipeline_tag: text-generation
 # 支持32k上下文的的Qwen-14b-chat模型
 <br>
 # LongBench测试结果
@@ -48,11 +54,17 @@ pipeline_tag: text-generation
 Qwen-14b-chat-yarn-32k经过微调后，在多文档问答（或检索）任务上提升非常显著，大幅领先其他同规模的模型。
 <br>
 # Usage
 * 使用此模型时会自动设置  ```config.use_logn_attn=False```、```config.use_dynamic_ntk=True```，会产生warning，不影响模型使用。
-* 长文本类型的任务，尽量将长参考文本放在前面，用户的问题放在后面。
 * 请务必安装```flash-attention2```，否则长文本下推理速度极慢，而且可能会报错。
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer,AutoConfig
@@ -77,7 +89,7 @@ print(response)
 ### 3.指令微调
 * 使用[yuyijiong/Long-Instruction-Chinese](https://huggingface.co/datasets/yuyijiong/Long-Instruction-Chinese)数据，Qlora方法，对Qwen模型进行微调。
-* 更多训练细节在论文中介绍。
 <br>

 ---
 **Read this in other languages: [English](README_en.md), [中文](README.md).**
+* 2023.12.30更新：“大海捞针”测试结果
 * 2023.12.28更新：发布Qwen-7b-chat-yarn-32k，但注意，可能由于模型规模偏小，基座模型能力弱，导致7b版本显著弱于Qwen-14b-chat-yarn-32k
 * 2023.12.23更新：发布LongBench的passage_retrieval_en的评测结果
 * 2023.12.16更新：发布[论文(中文版)](https://cloud.tsinghua.edu.cn/d/5894ec4442e54a6aac96/)、[论文(英文版)](https://arxiv.org/abs/2312.11193)
 # 支持32k上下文的的Qwen-14b-chat模型
+## 模型的主要特性：
+* 基于Qwen-14b-chat，使用“原文复述”任务进行指令微调
+* 使用Yarn插值方法，使模型能适应32k甚至更长的文本
+* 推理时，无需特定prompt，即可给出高准确率的回答。
 <br>
 # LongBench测试结果
 Qwen-14b-chat-yarn-32k经过微调后，在多文档问答（或检索）任务上提升非常显著，大幅领先其他同规模的模型。
+# "大海捞针"测试结果
+![](大海捞针50k.png)
+* 可以发现即使在50k长度下(即使训练样本不大于32k)，检索信息的准确率依然极高，证明此模型确实拥有强大的长上下文能力，极大缓解“lost in the middle”问题，并且拥有极大的扩展潜力。
+* 而且此模型在推理时，不需要进行"原文复述"，只需要给出问题，并让模型直接回答问题，模型就能给出正确的答案。（相对的，claude2.1-200k 需要特定的 prompt才能正确回答）这也证明了此模型的强大能力。
 <br>
 # Usage
 * 使用此模型时会自动设置  ```config.use_logn_attn=False```、```config.use_dynamic_ntk=True```，会产生warning，不影响模型使用。
+* 长文本类型的任务，尽量将长参考文本放在前面，用户的问题放在后面。同时，在问题前面最好加上 **“问题：”** 或 **“Question: ”** 等提示（可参考下面的多文档问答示例），以便模型更好的区分参考文本和用户问题。
 * 请务必安装```flash-attention2```，否则长文本下推理速度极慢，而且可能会报错。
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer,AutoConfig
 ### 3.指令微调
 * 使用[yuyijiong/Long-Instruction-Chinese](https://huggingface.co/datasets/yuyijiong/Long-Instruction-Chinese)数据，Qlora方法，对Qwen模型进行微调。
+* 更多细节在论文中介绍。
 <br>

README_en.md CHANGED Viewed

@@ -11,6 +11,7 @@ pipeline_tag: text-generation
 ---
 **Read this in other languages: [English](README_en.md), [中文](README.md).**
 * Updated on December 28, 2023: Release Qwen-7b-chat-yarn-32k, but note that the 7b version may be significantly weaker than Qwen-14b-chat-yarn-32k due to the small model size and weak base model capabilities.
 * Updated on December 23, 2023: Release the evaluation results of passage_retrieval_en in LongBench
 * Updated on December 16, 2023: Release [Paper](https://arxiv.org/abs/2312.11193)
@@ -19,6 +20,11 @@ pipeline_tag: text-generation
 # Qwen-14b-chat model with 32k context window
 # Evaluation results in LongBench
 ### Evaluation results for passage_retrieval_zh in LongBench
@@ -46,11 +52,19 @@ pipeline_tag: text-generation
 Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.
 <br>
 # Usage
 * When using this model, it will automatically set ```config.use_logn_attn=False``` and ```config.use_dynamic_ntk=True```, resulting in a warning message. Don't mind, this does not affect the model's performance.
-* For tasks involving long texts, it is recommended to place the long reference text before the user's question.
 * Please make sure to install ```flash attention 2```, otherwise the inference speed under long text will be extremely slow and errors may occur.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer,AutoConfig

 ---
 **Read this in other languages: [English](README_en.md), [中文](README.md).**
+* Updated on December 30, 2023: "Needle in a Haystack" test results
 * Updated on December 28, 2023: Release Qwen-7b-chat-yarn-32k, but note that the 7b version may be significantly weaker than Qwen-14b-chat-yarn-32k due to the small model size and weak base model capabilities.
 * Updated on December 23, 2023: Release the evaluation results of passage_retrieval_en in LongBench
 * Updated on December 16, 2023: Release [Paper](https://arxiv.org/abs/2312.11193)
 # Qwen-14b-chat model with 32k context window
+## Model Main Features:
+* Based on Qwen-14b-chat, fine-tuned with the "original text paraphrasing" task
+* Using Yarn interpolation method, the model can adapt to 32k or even longer text
+* During inference, the model can give high-accuracy answers without specially designed prompts
 # Evaluation results in LongBench
 ### Evaluation results for passage_retrieval_zh in LongBench
 Qwen-14b-chat-yarn-32k has shown significant improvement in multi-document question-answering (or retrieval) tasks and outperforms other models of similar scale.
+# Test Results for "Needle in a Haystack"
+![](大海捞针50k.png)
+* The model can accurately retrieve the needle in a haystack even when the context length is 50k or longer, proving that the model does have strong long-context capabilities, which greatly alleviates the "lost in the middle" problem.
+* In addition, the model does not need to paraphrase the original text during inference, it only needs to give the question and let the model answer the question directly, and the model can give the correct answer. (In contrast, claude2.1-200k needs a specific prompt to answer correctly) This also proves the powerful ability of this model.
 <br>
 # Usage
 * When using this model, it will automatically set ```config.use_logn_attn=False``` and ```config.use_dynamic_ntk=True```, resulting in a warning message. Don't mind, this does not affect the model's performance.
+* For tasks involving long texts, it is recommended to place the long reference text before the user's question. It is best to add a prefix such as **"Question: "** before the user's question, so that the model can better distinguish between the reference text and user's question.
 * Please make sure to install ```flash attention 2```, otherwise the inference speed under long text will be extremely slow and errors may occur.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer,AutoConfig