LongReward-glm4-9b-DPO / README_zh.md
zR
test
3fecc59
|
raw
history blame
2.98 kB
# LongReward-glm4-9b-DPO
Read this in [English](README.md)
<p align="center">
🤗 <a href="https://huggingface.co/datasets/THUDM/LongReward-10k" target="_blank">[LongReward Dataset] </a> • 💻 <a href="https://github.com/THUDM/LongReward" target="_blank">[Github Repo]</a> • 📃 <a href="https://arxiv.org/abs/" target="_blank">[LongReward Paper]</a>
</p>
LongReward-glm4-9b-DPO 是 [LongReward-glm4-9b-SFT](https://huggingface.co/THUDM/LongReward-glm4-9b-SFT) 的 DPO 版本,支持最多
64K 个 token 的最大上下文窗口。它在由 [LongReward-10k](https://huggingface.co/datasets/THUDM/LongReward-45) 分割的
`dpo_glm4_9b` 数据集上进行训练,该数据集是通过 LongReward 构建的长上下文偏好数据集。
环境要求: 与该模型环境要求相同 [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) (`transforemrs>=4.46.0`).
模型部署的简单示例:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_PATH = 'THUDM/LongReward-glm4-9b-DPO'
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")
message = [
{
"role": "user",
"content": "W. Russell Todd, 94, United States Army general (b. 1928). February 13. Tim Aymar, 59, heavy metal singer (Pharaoh) (b. 1963). Marshall \"Eddie\" Conway, 76, Black Panther Party leader (b. 1946). Roger Bonk, 78, football player (North Dakota Fighting Sioux, Winnipeg Blue Bombers) (b. 1944). Conrad Dobler, 72, football player (St. Louis Cardinals, New Orleans Saints, Buffalo Bills) (b. 1950). Brian DuBois, 55, baseball player (Detroit Tigers) (b. 1967). Robert Geddes, 99, architect, dean of the Princeton University School of Architecture (1965–1982) (b. 1923). Tom Luddy, 79, film producer (Barfly, The Secret Garden), co-founder of the Telluride Film Festival (b. 1943). David Singmaster, 84, mathematician (b. 1938). \n\n What was Robert Geddes' profession?"
}
]
inputs = tokenizer.apply_chat_template(
message,
return_tensors='pt',
add_generation_prompt=True,
return_dict=True,
).to(model.device)
input_len = inputs['input_ids'].shape[1]
generate_kwargs = {
"input_ids": inputs['input_ids'],
"attention_mask": inputs['attention_mask'],
"max_new_tokens": 128,
"do_sample": False,
}
out = model.generate(**generate_kwargs)
print(tokenizer.decode(out[0][input_len:], skip_special_tokens=True))
```
## 协议
本模型的权重的使用则需要遵循 [LICENSE](LICENSE)。
## 引用
如果你觉得我们的工作有帮助的话,请考虑引用下列论文。
```
@article{zhang2024longreward,
title = {LongReward: Improving Long-context Large Language Models
with AI Feedback}
author={Jiajie Zhang and Zhongni Hou and Xin Lv and Shulin Cao and Zhenyu Hou and Yilin Niu and Lei Hou and Lei Hou and Yuxiao Dong and Ling Feng and Juanzi Li},
journal={arXiv preprint arXiv:},
year={2024}
}
```