Spaces:
Runtime error
Runtime error
Merge remote-tracking branch 'gh-origin/hf'
Browse files- README.md +16 -7
- README_zh.md +17 -8
- app.py +96 -34
- llmriddles/questions/executor.py +4 -0
- llmriddles/questions/level1.py +41 -21
- llmriddles/questions/level2.py +72 -42
- llmriddles/questions/level3.py +41 -18
- llmriddles/questions/level4.py +13 -6
- llmriddles/questions/level5.py +16 -8
- llmriddles/questions/question.py +10 -3
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
title: LLMRiddles
|
3 |
emoji: 🚀
|
4 |
colorFrom: indigo
|
5 |
colorTo: green
|
@@ -21,11 +21,19 @@ python_version: 3.8
|
|
21 |
<br>
|
22 |
</div>
|
23 |
|
|
|
|
|
24 |
## :thinking: What's This
|
25 |
Welcome to LLM Riddles! This is a game of wits and courage with language models. In the game, you need to construct questions that interact with the language model to get answers that meet the requirements. In this process, you can use your brain and use all the methods you can think of to get the model to output the results required by the answer.
|
26 |
|
27 |
## :space_invader: How to Play
|
28 |
-
We provide an online version for players to directly access and try out.
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
### ChatGPT + Chinese
|
30 |
```shell
|
31 |
QUESTION_LANG=cn QUESTION_LLM='chatgpt' QUESTION_LLM_KEY=<your API key> python3 -u app.py
|
@@ -50,14 +58,15 @@ The question format should include the following points:
|
|
50 |
- Modify the corresponding chapter question files
|
51 |
- Modification of init.py
|
52 |
|
53 |
-
For a complete example, please refer to: [Submit your own level design]()
|
54 |
|
55 |
## :writing_hand: Roadmap
|
56 |
|
57 |
- [x] Support custom levels
|
58 |
-
- [
|
59 |
-
- [
|
60 |
- [x] Support Mistral-7B(English version)
|
|
|
61 |
- [ ] Support Baichuan2-7B(Chinese version)
|
62 |
- [ ] Support LLaMA2-7B(English version)
|
63 |
- [ ] LLM inference speed optimization
|
@@ -68,12 +77,12 @@ For a complete example, please refer to: [Submit your own level design]()
|
|
68 |
- Discuss on OpenDILab's WeChat group (i.e. add us on WeChat: ding314assist)
|
69 |
<img src=https://github.com/opendilab/LLMRiddles/blob/main/llmriddles/assets/wechat.jpeg width=35% />
|
70 |
|
71 |
-
## Special Thanks
|
72 |
- Thanks to [Haoqiang Fan](https://www.zhihu.com/people/haoqiang-fan) for his original idea and title, which provided inspiration and motivation for the development and expansion of this project.
|
73 |
- Thanks to [HuggingFace](https://huggingface.co) for supporting and assisting the game.
|
74 |
- Thanks to [LLM Riddles contributors](https://github.com/opendilab/LLMRiddles/graphs/contributors) for their implementation and support.
|
75 |
|
76 |
-
## License
|
77 |
All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
78 |
|
79 |
<p align="right">(<a href="#top">back to top</a>)</p>
|
|
|
1 |
---
|
2 |
+
title: LLMRiddles-ChatGPT-CN
|
3 |
emoji: 🚀
|
4 |
colorFrom: indigo
|
5 |
colorTo: green
|
|
|
21 |
<br>
|
22 |
</div>
|
23 |
|
24 |
+
English | [简体中文](https://github.com/opendilab/LLMRiddles/blob/main/README_zh.md)
|
25 |
+
|
26 |
## :thinking: What's This
|
27 |
Welcome to LLM Riddles! This is a game of wits and courage with language models. In the game, you need to construct questions that interact with the language model to get answers that meet the requirements. In this process, you can use your brain and use all the methods you can think of to get the model to output the results required by the answer.
|
28 |
|
29 |
## :space_invader: How to Play
|
30 |
+
We provide an online version for players to directly access and try out.
|
31 |
+
- [ChatGPT + English(w/o key)](https://huggingface.co/spaces/OpenDILabCommunity/LLMRiddlesChatGPTEN)
|
32 |
+
- [ChatGPT + Chinese(w/o key)](https://huggingface.co/spaces/OpenDILabCommunity/LLMRiddlesChatGPTCN)
|
33 |
+
- [Mistral + English(w/ key)](https://4521e4d138d3779498.gradio.live)
|
34 |
+
- [ChatGPT + Chinese(w/ key)](http://llmriddles.opendilab.net/)
|
35 |
+
|
36 |
+
Local deployment can be done in the following ways:
|
37 |
### ChatGPT + Chinese
|
38 |
```shell
|
39 |
QUESTION_LANG=cn QUESTION_LLM='chatgpt' QUESTION_LLM_KEY=<your API key> python3 -u app.py
|
|
|
58 |
- Modify the corresponding chapter question files
|
59 |
- Modification of init.py
|
60 |
|
61 |
+
For a complete example, please refer to: [Submit your own level design](https://github.com/opendilab/LLMRiddles/pull/6)
|
62 |
|
63 |
## :writing_hand: Roadmap
|
64 |
|
65 |
- [x] Support custom levels
|
66 |
+
- [x] Online trial link
|
67 |
+
- [x] Hugging Face Space link
|
68 |
- [x] Support Mistral-7B(English version)
|
69 |
+
- [ ] Support ChatGLM(Chinese version)
|
70 |
- [ ] Support Baichuan2-7B(Chinese version)
|
71 |
- [ ] Support LLaMA2-7B(English version)
|
72 |
- [ ] LLM inference speed optimization
|
|
|
77 |
- Discuss on OpenDILab's WeChat group (i.e. add us on WeChat: ding314assist)
|
78 |
<img src=https://github.com/opendilab/LLMRiddles/blob/main/llmriddles/assets/wechat.jpeg width=35% />
|
79 |
|
80 |
+
## :star2: Special Thanks
|
81 |
- Thanks to [Haoqiang Fan](https://www.zhihu.com/people/haoqiang-fan) for his original idea and title, which provided inspiration and motivation for the development and expansion of this project.
|
82 |
- Thanks to [HuggingFace](https://huggingface.co) for supporting and assisting the game.
|
83 |
- Thanks to [LLM Riddles contributors](https://github.com/opendilab/LLMRiddles/graphs/contributors) for their implementation and support.
|
84 |
|
85 |
+
## :label: License
|
86 |
All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
87 |
|
88 |
<p align="right">(<a href="#top">back to top</a>)</p>
|
README_zh.md
CHANGED
@@ -8,11 +8,19 @@
|
|
8 |
<br>
|
9 |
</div>
|
10 |
|
|
|
|
|
11 |
## :thinking: 什么是LLM Riddles
|
12 |
欢迎来到 LLM Riddles!这是一个与语言模型斗智斗勇的游戏。在游戏中,你需要构造与语言模型交互的问题,来得到符合要求的答案。在这个过程中,你可以开动脑筋,用你想到的所有方式,让模型输出答案要求的结果。
|
13 |
|
14 |
## :space_invader: 如何试玩
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
### ChatGPT + 中文
|
17 |
```shell
|
18 |
QUESTION_LANG=cn QUESTION_LLM='chatgpt' QUESTION_LLM_KEY=<your API key> python3 -u app.py
|
@@ -42,16 +50,17 @@ QUESTION_LANG=en QUESTION_LLM='llama2-7b' python3 -u app.py
|
|
42 |
- 对应章节问题文件的修改
|
43 |
- init.py的修改
|
44 |
|
45 |
-
完整示例请参考:[提交属于自己的关卡设计]()
|
46 |
|
47 |
## :writing_hand: 未来计划
|
48 |
|
49 |
- [x] 支持自定义关卡
|
50 |
-
- [
|
51 |
-
- [
|
52 |
-
- [
|
53 |
-
- [ ] 支持
|
54 |
- [ ] 支持Baichuan2-7B(中文)
|
|
|
55 |
- [ ] LLM 推理速度优化
|
56 |
|
57 |
## :speech_balloon: 反馈问题 & 提出建议
|
@@ -60,12 +69,12 @@ QUESTION_LANG=en QUESTION_LLM='llama2-7b' python3 -u app.py
|
|
60 |
- 在OpenDILab的群组中加入讨论(通过 WeChat: ding314assist 添加小助手微信)
|
61 |
<img src=https://github.com/opendilab/LLMRiddles/blob/main/llmriddles/assets/wechat.jpeg width=35% />
|
62 |
|
63 |
-
## Special Thanks
|
64 |
- 感谢 [Haoqiang Fan](https://www.zhihu.com/people/haoqiang-fan) 的原始创意和题目,为本项目的开发和扩展提供了灵感与动力。
|
65 |
- 感谢 [HuggingFace](https://huggingface.co) 对游戏的支持与协助。
|
66 |
- 感谢 [LLM Riddles contributors](https://github.com/opendilab/LLMRiddles/graphs/contributors) 的实现与支持。
|
67 |
|
68 |
-
## License
|
69 |
All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
70 |
|
71 |
<p align="right">(<a href="#top">back to top</a>)</p>
|
|
|
8 |
<br>
|
9 |
</div>
|
10 |
|
11 |
+
[English](https://github.com/opendilab/LLMRiddles/blob/main/README.md) | 简体中文
|
12 |
+
|
13 |
## :thinking: 什么是LLM Riddles
|
14 |
欢迎来到 LLM Riddles!这是一个与语言模型斗智斗勇的游戏。在游戏中,你需要构造与语言模型交互的问题,来得到符合要求的答案。在这个过程中,你可以开动脑筋,用你想到的所有方式,让模型输出答案要求的结果。
|
15 |
|
16 |
## :space_invader: 如何试玩
|
17 |
+
我们提供了在线版本以供玩家直接访问试玩:
|
18 |
+
- [ChatGPT + 英文(需配置api key)](https://huggingface.co/spaces/OpenDILabCommunity/LLMRiddlesChatGPTEN)
|
19 |
+
- [ChatGPT + 中文(需配置api key)](https://huggingface.co/spaces/OpenDILabCommunity/LLMRiddlesChatGPTCN)
|
20 |
+
- [Mistral + 英文(已预设api key)](https://4521e4d138d3779498.gradio.live)
|
21 |
+
- [ChatGPT + 中文(已预设api key)](http://llmriddles.opendilab.net/)
|
22 |
+
|
23 |
+
本地部署可以通过以下方式:
|
24 |
### ChatGPT + 中文
|
25 |
```shell
|
26 |
QUESTION_LANG=cn QUESTION_LLM='chatgpt' QUESTION_LLM_KEY=<your API key> python3 -u app.py
|
|
|
50 |
- 对应章节问题文件的修改
|
51 |
- init.py的修改
|
52 |
|
53 |
+
完整示例请参考:[提交属于自己的关卡设计](https://github.com/opendilab/LLMRiddles/pull/6)
|
54 |
|
55 |
## :writing_hand: 未来计划
|
56 |
|
57 |
- [x] 支持自定义关卡
|
58 |
+
- [x] 在线试玩链接
|
59 |
+
- [x] Hugging Face Space 链接
|
60 |
+
- [x] 支持Mistral-7B(英文)
|
61 |
+
- [ ] 支持ChatGLM(中文)
|
62 |
- [ ] 支持Baichuan2-7B(中文)
|
63 |
+
- [ ] 支持LLaMA2-7B(英文)
|
64 |
- [ ] LLM 推理速度优化
|
65 |
|
66 |
## :speech_balloon: 反馈问题 & 提出建议
|
|
|
69 |
- 在OpenDILab的群组中加入讨论(通过 WeChat: ding314assist 添加小助手微信)
|
70 |
<img src=https://github.com/opendilab/LLMRiddles/blob/main/llmriddles/assets/wechat.jpeg width=35% />
|
71 |
|
72 |
+
## :star2: Special Thanks
|
73 |
- 感谢 [Haoqiang Fan](https://www.zhihu.com/people/haoqiang-fan) 的原始创意和题目,为本项目的开发和扩展提供了灵感与动力。
|
74 |
- 感谢 [HuggingFace](https://huggingface.co) 对游戏的支持与协助。
|
75 |
- 感谢 [LLM Riddles contributors](https://github.com/opendilab/LLMRiddles/graphs/contributors) 的实现与支持。
|
76 |
|
77 |
+
## :label: License
|
78 |
All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
79 |
|
80 |
<p align="right">(<a href="#top">back to top</a>)</p>
|
app.py
CHANGED
@@ -1,13 +1,13 @@
|
|
|
|
1 |
import os
|
2 |
import uuid
|
3 |
-
import logging
|
4 |
|
5 |
import gradio as gr
|
6 |
|
7 |
from llmriddles.questions import QuestionExecutor
|
8 |
from llmriddles.questions import list_ordered_questions
|
9 |
|
10 |
-
|
11 |
count = 0
|
12 |
_QUESTIONS = list_ordered_questions()
|
13 |
_LANG = os.environ.get('QUESTION_LANG', 'cn')
|
@@ -17,21 +17,18 @@ assert _LLM in ['chatgpt', 'mistral-7b'], _LLM
|
|
17 |
_LLM_KEY = os.environ.get('QUESTION_LLM_KEY', None)
|
18 |
_DEBUG = os.environ.get('DEBUG', 'false').lower() == 'true'
|
19 |
|
|
|
|
|
|
|
|
|
20 |
if _LANG == "cn":
|
21 |
-
if _DEBUG:
|
22 |
-
logging.getLogger().setLevel(logging.INFO)
|
23 |
-
else:
|
24 |
-
logging.getLogger().setLevel(logging.WARNING)
|
25 |
title = "完蛋!我被 LLM 拿捏了"
|
26 |
requirement_ph = """
|
27 |
-
欢迎来到 LLM Riddles!
|
28 |
-
|
29 |
-
你将通过本游戏对大语言模型产生更深刻的理解。在本游戏中,你需要构造一个提给语言大模型的问题,使得它回复的答案符合题目要求。
|
30 |
-
|
31 |
-
点击\"下一题\"即可开始游戏
|
32 |
"""
|
33 |
requirement_label = "游戏须知/说明"
|
34 |
-
question_ph = "
|
35 |
question_label = "玩家提问栏"
|
36 |
answer_ph = "大语言模型的回答"
|
37 |
answer_label = "大语言模型回答栏"
|
@@ -41,17 +38,19 @@ if _LANG == "cn":
|
|
41 |
api_label = "API key"
|
42 |
predict_label = "结果正确性"
|
43 |
explanation_label = "结果详细解释"
|
44 |
-
game_cleared_label = "
|
45 |
correct_label = "正确"
|
46 |
wrong_label = "错误"
|
47 |
api_error_info = "请在提交问题之前先输入你的 API Key"
|
48 |
try_again_label = "再玩一次"
|
|
|
49 |
title_markdown = """
|
50 |
<div align="center">
|
51 |
<img src="https://raw.githubusercontent.com/opendilab/LLMRiddles/main/llmriddles/assets/banner.svg" width="80%" height="20%" alt="Banner Image">
|
52 |
</div>
|
53 |
-
<h2 style="text-align: center; color: black;"><a href="https://github.com/OpenDILab"> 🎭LLM Riddles:完蛋!我被 LLM 拿捏了</a></h2>
|
54 |
-
<
|
|
|
55 |
<strong><h5 align="center">注意:算法模型的输出可能包含一定的随机性。相关结果不代表任何开发者和相关 AI 服务的态度和意见。本项目开发者不对生成结果作任何保证,仅供娱乐。<h5></strong>
|
56 |
"""
|
57 |
tos_markdown = """
|
@@ -65,14 +64,11 @@ if _LANG == "cn":
|
|
65 |
elif _LANG == "en":
|
66 |
title = "LLM Riddles: Oops! Rolling in LLM."
|
67 |
requirement_ph = """
|
68 |
-
Welcome to LLM Riddles!
|
69 |
-
|
70 |
-
In this game, you'll gain a deeper understanding of language models. Your challenge is to create a question to ask a language model in a way that the answer it provides meets specific criteria.
|
71 |
-
|
72 |
-
Click \'Next\' to Start
|
73 |
"""
|
74 |
requirement_label = "Game Requirements"
|
75 |
-
question_ph = "Your Question for LLM"
|
76 |
question_label = "Question"
|
77 |
answer_ph = "Answer From LLM"
|
78 |
answer_label = "Answer"
|
@@ -82,17 +78,18 @@ elif _LANG == "en":
|
|
82 |
api_label = "API key"
|
83 |
predict_label = "Correctness"
|
84 |
explanation_label = "Explanation"
|
85 |
-
game_cleared_label = "Congratulations
|
86 |
correct_label = "Correct"
|
87 |
wrong_label = "Wrong"
|
88 |
api_error_info = "Please Enter API Key Before Submitting Question."
|
89 |
try_again_label = "Try Again"
|
|
|
90 |
title_markdown = """
|
91 |
<div align="center">
|
92 |
<img src="https://raw.githubusercontent.com/opendilab/LLMRiddles/main/llmriddles/assets/banner.svg" width="80%" height="20%" alt="Banner Image">
|
93 |
</div>
|
94 |
-
<h2 style="text-align: center; color: black;"><a href="https://github.com/OpenDILab"> 🎭LLM Riddles: Oops! Rolling in LLM.</a></h2>
|
95 |
-
<
|
96 |
<strong><h5 align="center">Notice: The output is generated by algorithm scheme and may involve some randomness. It does not represent the attitudes and opinions of any developers and AI services in this project. We do not make any guarantees about the generated content.<h5></strong>
|
97 |
"""
|
98 |
tos_markdown = """
|
@@ -123,7 +120,7 @@ if __name__ == '__main__':
|
|
123 |
gr.Markdown(title_markdown)
|
124 |
|
125 |
with gr.Row():
|
126 |
-
gr_requirement = gr.
|
127 |
with gr.Row():
|
128 |
with gr.Column():
|
129 |
gr_question = gr.TextArea(placeholder=question_ph, label=question_label)
|
@@ -131,6 +128,11 @@ if __name__ == '__main__':
|
|
131 |
with gr.Row():
|
132 |
gr_submit = gr.Button(submit_label, interactive=False)
|
133 |
gr_next = gr.Button(next_label)
|
|
|
|
|
|
|
|
|
|
|
134 |
|
135 |
with gr.Column():
|
136 |
gr_uuid = gr.Text(value='', visible=False)
|
@@ -139,6 +141,48 @@ if __name__ == '__main__':
|
|
139 |
gr_explanation = gr.TextArea(label=explanation_label, lines=1)
|
140 |
gr.Markdown(tos_markdown)
|
141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
142 |
|
143 |
def _next_question(uuid_):
|
144 |
global count
|
@@ -146,40 +190,56 @@ if __name__ == '__main__':
|
|
146 |
uuid_ = str(uuid.uuid4())
|
147 |
count += 1
|
148 |
logging.info(f'Player {count} starts the game now')
|
149 |
-
global
|
150 |
-
|
|
|
|
|
|
|
151 |
_qid += 1
|
152 |
-
|
153 |
|
154 |
if _qid >= len(_QUESTIONS):
|
155 |
-
del
|
156 |
logging.info(f'Player {count} has passed the game now')
|
157 |
return game_cleared_label, '', '', {}, '', \
|
158 |
gr.Button(submit_label, interactive=False), \
|
159 |
gr.Button(try_again_label, interactive=True), \
|
160 |
-
''
|
|
|
|
|
|
|
|
|
161 |
else:
|
162 |
executor = QuestionExecutor(_QUESTIONS[_qid], _LANG)
|
163 |
-
|
|
|
164 |
gr.Button(submit_label, interactive=True), \
|
165 |
gr.Button(next_label, interactive=False), \
|
166 |
-
uuid_
|
|
|
|
|
|
|
|
|
|
|
|
|
167 |
|
168 |
gr_next.click(
|
169 |
fn=_next_question,
|
170 |
inputs=[gr_uuid],
|
171 |
outputs=[
|
172 |
gr_requirement, gr_question, gr_answer,
|
173 |
-
gr_predict, gr_explanation, gr_submit, gr_next,
|
|
|
174 |
],
|
175 |
)
|
176 |
|
177 |
|
178 |
def _submit_answer(qs_text: str, api_key: str, uuid_: str):
|
|
|
179 |
if _need_api_key() and not api_key:
|
180 |
raise gr.Error(api_error_info)
|
181 |
|
182 |
-
_qid =
|
183 |
executor = QuestionExecutor(
|
184 |
_QUESTIONS[_qid], _LANG,
|
185 |
llm=_LLM, llm_cfgs=_get_api_key_cfgs(api_key) if _need_api_key() else {'api_key': _LLM_KEY}
|
@@ -187,10 +247,12 @@ if __name__ == '__main__':
|
|
187 |
answer_text, correctness, explanation = executor.check(qs_text)
|
188 |
labels = {correct_label: 1.0} if correctness else {wrong_label: 1.0}
|
189 |
if correctness:
|
|
|
190 |
return answer_text, labels, explanation, gr.Button(next_label, interactive=True), uuid_
|
191 |
else:
|
192 |
return answer_text, labels, explanation, gr.Button(next_label, interactive=False), uuid_
|
193 |
|
|
|
194 |
gr_submit.click(
|
195 |
_submit_answer,
|
196 |
inputs=[gr_question, gr_api_key, gr_uuid],
|
|
|
1 |
+
import logging
|
2 |
import os
|
3 |
import uuid
|
|
|
4 |
|
5 |
import gradio as gr
|
6 |
|
7 |
from llmriddles.questions import QuestionExecutor
|
8 |
from llmriddles.questions import list_ordered_questions
|
9 |
|
10 |
+
_QUESTION_SESSIONS = {}
|
11 |
count = 0
|
12 |
_QUESTIONS = list_ordered_questions()
|
13 |
_LANG = os.environ.get('QUESTION_LANG', 'cn')
|
|
|
17 |
_LLM_KEY = os.environ.get('QUESTION_LLM_KEY', None)
|
18 |
_DEBUG = os.environ.get('DEBUG', 'false').lower() == 'true'
|
19 |
|
20 |
+
if _DEBUG:
|
21 |
+
logging.getLogger().setLevel(logging.INFO)
|
22 |
+
else:
|
23 |
+
logging.getLogger().setLevel(logging.WARNING)
|
24 |
if _LANG == "cn":
|
|
|
|
|
|
|
|
|
25 |
title = "完蛋!我被 LLM 拿捏了"
|
26 |
requirement_ph = """
|
27 |
+
<h2 style="color: #6d28d9;"> 欢迎来到 LLM Riddles! </h2>
|
28 |
+
<h4> 你将通过本游戏对大语言模型产生更深刻的理解。在本游戏中,你需要构造一个提给语言大模型的问题,使得它回复的答案符合题目要求。点击<i>\"下一题\"</i> 即可开始游戏。</h4>
|
|
|
|
|
|
|
29 |
"""
|
30 |
requirement_label = "游戏须知/说明"
|
31 |
+
question_ph = "你对大语言模型的提问(例如:请你输出1+1=3)"
|
32 |
question_label = "玩家提问栏"
|
33 |
answer_ph = "大语言模型的回答"
|
34 |
answer_label = "大语言模型回答栏"
|
|
|
38 |
api_label = "API key"
|
39 |
predict_label = "结果正确性"
|
40 |
explanation_label = "结果详细解释"
|
41 |
+
game_cleared_label = "<h2 style='color: #6d28d9;'>祝贺!你已成功通关!</h2>"
|
42 |
correct_label = "正确"
|
43 |
wrong_label = "错误"
|
44 |
api_error_info = "请在提交问题之前先输入你的 API Key"
|
45 |
try_again_label = "再玩一次"
|
46 |
+
select_label = "选择关卡(投机取巧需谨慎)"
|
47 |
title_markdown = """
|
48 |
<div align="center">
|
49 |
<img src="https://raw.githubusercontent.com/opendilab/LLMRiddles/main/llmriddles/assets/banner.svg" width="80%" height="20%" alt="Banner Image">
|
50 |
</div>
|
51 |
+
<h2 style="text-align: center; color: black;"><a href="https://github.com/OpenDILab/LLMRiddles"> 🎭LLM Riddles:完蛋!我被 LLM 拿捏了</a></h2>
|
52 |
+
<strong><h5 align="center"> 更多不同语言模型的在线试玩 demo 可以访问 GitHub<a href="https://github.com/OpenDILab/LLMRiddles">源代码仓库</a>获取<h5></strong>
|
53 |
+
<h5 align="center"> 如果你喜欢这个项目,请给我们在 GitHub 点个 star ✨ <a href="https://github.com/OpenDILab/LLMRiddles"> 代码仓库传送门 </a> 。我们将会持续保持更新。再次感谢游戏<a href="https://www.zhihu.com/people/haoqiang-fan"> 原作者 </a>的奇思妙想! </h5>
|
54 |
<strong><h5 align="center">注意:算法模型的输出可能包含一定的随机性。相关结果不代表任何开发者和相关 AI 服务的态度和意见。本项目开发者不对生成结果作任何保证,仅供娱乐。<h5></strong>
|
55 |
"""
|
56 |
tos_markdown = """
|
|
|
64 |
elif _LANG == "en":
|
65 |
title = "LLM Riddles: Oops! Rolling in LLM."
|
66 |
requirement_ph = """
|
67 |
+
<h2 style="color: #6d28d9;">Welcome to LLM Riddles! </h2>
|
68 |
+
<h4> In this game, you'll gain a deeper understanding of language models. Your challenge is to create a question to ask a language model in a way that the answer it provides meets specific criteria. Click <i>\'Next\'</i> to Start</h4>
|
|
|
|
|
|
|
69 |
"""
|
70 |
requirement_label = "Game Requirements"
|
71 |
+
question_ph = "Your Question for LLM (e.g. Please print 1+1=3)"
|
72 |
question_label = "Question"
|
73 |
answer_ph = "Answer From LLM"
|
74 |
answer_label = "Answer"
|
|
|
78 |
api_label = "API key"
|
79 |
predict_label = "Correctness"
|
80 |
explanation_label = "Explanation"
|
81 |
+
game_cleared_label = "<h2 style='color: #6d28d9;'>Congratulations!</h2>"
|
82 |
correct_label = "Correct"
|
83 |
wrong_label = "Wrong"
|
84 |
api_error_info = "Please Enter API Key Before Submitting Question."
|
85 |
try_again_label = "Try Again"
|
86 |
+
select_label = "Select level"
|
87 |
title_markdown = """
|
88 |
<div align="center">
|
89 |
<img src="https://raw.githubusercontent.com/opendilab/LLMRiddles/main/llmriddles/assets/banner.svg" width="80%" height="20%" alt="Banner Image">
|
90 |
</div>
|
91 |
+
<h2 style="text-align: center; color: black;"><a href="https://github.com/OpenDILab/LLMRiddles"> 🎭LLM Riddles: Oops! Rolling in LLM.</a></h2>
|
92 |
+
<h5 align="center"> If you like our project, please give us a star ✨ on GitHub for latest update <a href="https://github.com/OpenDILab/LLMRiddles"> (Code Link) </a>. Thanks for the interesting idea of the original game <a href="https://www.zhihu.com/people/haoqiang-fan"> author </a>. </h5>
|
93 |
<strong><h5 align="center">Notice: The output is generated by algorithm scheme and may involve some randomness. It does not represent the attitudes and opinions of any developers and AI services in this project. We do not make any guarantees about the generated content.<h5></strong>
|
94 |
"""
|
95 |
tos_markdown = """
|
|
|
120 |
gr.Markdown(title_markdown)
|
121 |
|
122 |
with gr.Row():
|
123 |
+
gr_requirement = gr.HTML(value=requirement_ph, label=requirement_label)
|
124 |
with gr.Row():
|
125 |
with gr.Column():
|
126 |
gr_question = gr.TextArea(placeholder=question_ph, label=question_label)
|
|
|
128 |
with gr.Row():
|
129 |
gr_submit = gr.Button(submit_label, interactive=False)
|
130 |
gr_next = gr.Button(next_label)
|
131 |
+
with gr.Row():
|
132 |
+
gr_select = gr.Radio(
|
133 |
+
choices=[(QuestionExecutor(q, _LANG).question_name, i) for i, q in enumerate(_QUESTIONS)],
|
134 |
+
label=select_label
|
135 |
+
)
|
136 |
|
137 |
with gr.Column():
|
138 |
gr_uuid = gr.Text(value='', visible=False)
|
|
|
141 |
gr_explanation = gr.TextArea(label=explanation_label, lines=1)
|
142 |
gr.Markdown(tos_markdown)
|
143 |
|
144 |
+
def _postprocess_question_text(question_text):
|
145 |
+
if _LANG == 'cn':
|
146 |
+
idx = question_text.find(',')
|
147 |
+
question_title = question_text[:idx]
|
148 |
+
former, latter = question_title.split('(')
|
149 |
+
question_title = former + ':' + latter[:-1]
|
150 |
+
question_text = f"<h2 style='color: #6d28d9;'>{question_title}</h2><h4>{question_text[idx+1:]}</h4>"
|
151 |
+
elif _LANG == 'en':
|
152 |
+
idx = question_text.find(',')
|
153 |
+
question_text = f"<h2 style='color: #6d28d9;'>{question_text[:idx]}</h2><h4>{question_text[idx+1:]}</h4>"
|
154 |
+
return question_text
|
155 |
+
|
156 |
+
|
157 |
+
def _radio_select(uuid_, select_qid):
|
158 |
+
global count
|
159 |
+
if not uuid_:
|
160 |
+
uuid_ = str(uuid.uuid4())
|
161 |
+
count += 1
|
162 |
+
logging.info(f'Player {count} starts the game now')
|
163 |
+
global _QUESTION_SESSIONS
|
164 |
+
if uuid_ not in _QUESTION_SESSIONS:
|
165 |
+
_QUESTION_SESSIONS[uuid_] = set(), select_qid
|
166 |
+
else:
|
167 |
+
_exists, _ = _QUESTION_SESSIONS[uuid_]
|
168 |
+
_QUESTION_SESSIONS[uuid_] = _exists, select_qid
|
169 |
+
|
170 |
+
executor = QuestionExecutor(_QUESTIONS[select_qid], _LANG)
|
171 |
+
question_text = _postprocess_question_text(executor.question_text)
|
172 |
+
return question_text, '', '', {}, '', \
|
173 |
+
gr.Button(submit_label, interactive=True), \
|
174 |
+
gr.Button(next_label, interactive=False), \
|
175 |
+
uuid_
|
176 |
+
|
177 |
+
gr_select.select(
|
178 |
+
_radio_select,
|
179 |
+
inputs=[gr_uuid, gr_select],
|
180 |
+
outputs=[
|
181 |
+
gr_requirement, gr_question, gr_answer,
|
182 |
+
gr_predict, gr_explanation, gr_submit, gr_next, gr_uuid,
|
183 |
+
],
|
184 |
+
)
|
185 |
+
|
186 |
|
187 |
def _next_question(uuid_):
|
188 |
global count
|
|
|
190 |
uuid_ = str(uuid.uuid4())
|
191 |
count += 1
|
192 |
logging.info(f'Player {count} starts the game now')
|
193 |
+
global _QUESTION_SESSIONS
|
194 |
+
if uuid_ in _QUESTION_SESSIONS:
|
195 |
+
_exists, _qid = _QUESTION_SESSIONS[uuid_]
|
196 |
+
else:
|
197 |
+
_exists, _qid = set(), -1
|
198 |
_qid += 1
|
199 |
+
_QUESTION_SESSIONS[uuid_] = _exists, _qid
|
200 |
|
201 |
if _qid >= len(_QUESTIONS):
|
202 |
+
del _QUESTION_SESSIONS[uuid_]
|
203 |
logging.info(f'Player {count} has passed the game now')
|
204 |
return game_cleared_label, '', '', {}, '', \
|
205 |
gr.Button(submit_label, interactive=False), \
|
206 |
gr.Button(try_again_label, interactive=True), \
|
207 |
+
'', \
|
208 |
+
gr.Radio(
|
209 |
+
choices=[(QuestionExecutor(q, _LANG).question_name, i) for i, q in enumerate(_QUESTIONS)],
|
210 |
+
label=select_label
|
211 |
+
)
|
212 |
else:
|
213 |
executor = QuestionExecutor(_QUESTIONS[_qid], _LANG)
|
214 |
+
question_text = _postprocess_question_text(executor.question_text)
|
215 |
+
return question_text, '', '', {}, '', \
|
216 |
gr.Button(submit_label, interactive=True), \
|
217 |
gr.Button(next_label, interactive=False), \
|
218 |
+
uuid_, \
|
219 |
+
gr.Radio(
|
220 |
+
choices=[(QuestionExecutor(q, _LANG).question_name, i) for i, q in enumerate(_QUESTIONS)],
|
221 |
+
value=_qid,
|
222 |
+
label=select_label,
|
223 |
+
)
|
224 |
+
|
225 |
|
226 |
gr_next.click(
|
227 |
fn=_next_question,
|
228 |
inputs=[gr_uuid],
|
229 |
outputs=[
|
230 |
gr_requirement, gr_question, gr_answer,
|
231 |
+
gr_predict, gr_explanation, gr_submit, gr_next,
|
232 |
+
gr_uuid, gr_select,
|
233 |
],
|
234 |
)
|
235 |
|
236 |
|
237 |
def _submit_answer(qs_text: str, api_key: str, uuid_: str):
|
238 |
+
global _QUESTION_SESSIONS
|
239 |
if _need_api_key() and not api_key:
|
240 |
raise gr.Error(api_error_info)
|
241 |
|
242 |
+
_exists, _qid = _QUESTION_SESSIONS[uuid_]
|
243 |
executor = QuestionExecutor(
|
244 |
_QUESTIONS[_qid], _LANG,
|
245 |
llm=_LLM, llm_cfgs=_get_api_key_cfgs(api_key) if _need_api_key() else {'api_key': _LLM_KEY}
|
|
|
247 |
answer_text, correctness, explanation = executor.check(qs_text)
|
248 |
labels = {correct_label: 1.0} if correctness else {wrong_label: 1.0}
|
249 |
if correctness:
|
250 |
+
_QUESTION_SESSIONS[uuid_] = (_exists | {_qid}), _qid
|
251 |
return answer_text, labels, explanation, gr.Button(next_label, interactive=True), uuid_
|
252 |
else:
|
253 |
return answer_text, labels, explanation, gr.Button(next_label, interactive=False), uuid_
|
254 |
|
255 |
+
|
256 |
gr_submit.click(
|
257 |
_submit_answer,
|
258 |
inputs=[gr_question, gr_api_key, gr_uuid],
|
llmriddles/questions/executor.py
CHANGED
@@ -15,6 +15,10 @@ class QuestionExecutor:
|
|
15 |
def question_text(self):
|
16 |
return self.question.texts[self.lang]
|
17 |
|
|
|
|
|
|
|
|
|
18 |
def check(self, qs_text: str) -> Tuple[str, bool, str]:
|
19 |
answer_text = get_llm_fn(self.llm)(qs_text, **self.llm_cfgs)
|
20 |
correct, explanation = self.check_answer(qs_text, answer_text)
|
|
|
15 |
def question_text(self):
|
16 |
return self.question.texts[self.lang]
|
17 |
|
18 |
+
@property
|
19 |
+
def question_name(self):
|
20 |
+
return self.question.names[self.lang]
|
21 |
+
|
22 |
def check(self, qs_text: str) -> Tuple[str, bool, str]:
|
23 |
answer_text = get_llm_fn(self.llm)(qs_text, **self.llm_cfgs)
|
24 |
correct, explanation = self.check_answer(qs_text, answer_text)
|
llmriddles/questions/level1.py
CHANGED
@@ -1,14 +1,18 @@
|
|
1 |
from .question import register_question
|
2 |
|
|
|
3 |
def count_english_words(text: str):
|
4 |
return len(text.split(' '))
|
5 |
|
|
|
6 |
def count_chinese_words(text: str):
|
7 |
return len(text)
|
8 |
|
|
|
9 |
def check_if_chinese(text: str):
|
10 |
return all('\u4e00' <= char <= '\u9fff' for char in text)
|
11 |
|
|
|
12 |
def count_words(text: str, contain_punctuation: bool = False):
|
13 |
chinese_words = []
|
14 |
english_words = []
|
@@ -29,9 +33,9 @@ def count_words(text: str, contain_punctuation: bool = False):
|
|
29 |
temp_english_words = []
|
30 |
other_words.append(char)
|
31 |
if contain_punctuation:
|
32 |
-
return len(chinese_words)+len(english_words)+len(other_words)
|
33 |
else:
|
34 |
-
return len(chinese_words)+len(english_words)
|
35 |
|
36 |
|
37 |
CN_TEXT_1 = """
|
@@ -57,11 +61,14 @@ def _checker_1(question_text: str, user_text: str, answer_text: str, lang: str):
|
|
57 |
return False, None
|
58 |
|
59 |
|
60 |
-
register_question(
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
|
|
|
|
|
|
65 |
|
66 |
CN_TEXT_2 = """
|
67 |
第一章第二题(小试牛刀),请你输入三个字(及)以内的问题,使模型的回答在30个字以上。
|
@@ -89,11 +96,14 @@ def _checker_2(question_text: str, user_text: str, answer_text: str, lang: str):
|
|
89 |
return True, None
|
90 |
|
91 |
|
92 |
-
register_question(
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
|
|
|
|
|
|
97 |
|
98 |
CN_TEXT_3 = """
|
99 |
第一章第三题(短说长话),请你输入一个字的问题,使模型的回答在100个字以上。
|
@@ -107,6 +117,7 @@ For the third question in chapter 1, please enter a one-word question so that th
|
|
107 |
Please enter your query below and click the submit button
|
108 |
"""
|
109 |
|
|
|
110 |
def _checker_3(question_text: str, user_text: str, answer_text: str, lang: str):
|
111 |
_ = question_text, lang
|
112 |
answer_text = answer_text.strip()
|
@@ -119,11 +130,15 @@ def _checker_3(question_text: str, user_text: str, answer_text: str, lang: str):
|
|
119 |
else:
|
120 |
return True, None
|
121 |
|
122 |
-
register_question({
|
123 |
-
'cn': CN_TEXT_3,
|
124 |
-
'en': EN_TEXT_3,
|
125 |
-
}, _checker_3)
|
126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
127 |
|
128 |
CN_TEXT_4 = """
|
129 |
第一章第四题(短说短话),请输入一个字的问题,使模型的回答字数小于20个字。
|
@@ -137,6 +152,7 @@ For the fourth question in chapter 1, please enter a one-word question so that t
|
|
137 |
Please enter your query below and click the submit button
|
138 |
"""
|
139 |
|
|
|
140 |
def _checker_4(question_text: str, user_text: str, answer_text: str, lang: str):
|
141 |
_ = question_text, lang
|
142 |
answer_text = answer_text.strip()
|
@@ -145,15 +161,19 @@ def _checker_4(question_text: str, user_text: str, answer_text: str, lang: str):
|
|
145 |
if count_words(user_text) > 1:
|
146 |
return False, "用户的问题长度应该在一个字及以内" if lang == 'cn' else 'Question should be one word.'
|
147 |
elif count_words(answer_text) >= 20:
|
148 |
-
return False, "大语言模型的答案应该小于
|
149 |
else:
|
150 |
return True, None
|
151 |
|
152 |
-
register_question({
|
153 |
-
'cn': CN_TEXT_4,
|
154 |
-
'en': EN_TEXT_4,
|
155 |
-
}, _checker_4)
|
156 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
157 |
|
158 |
# CN_TEXT_5 = """
|
159 |
# 第一章第五题(回文不变),请输入一个本身不是回文串的问题,使无论正着问还是倒着问,模型的回答是一样的。
|
|
|
1 |
from .question import register_question
|
2 |
|
3 |
+
|
4 |
def count_english_words(text: str):
|
5 |
return len(text.split(' '))
|
6 |
|
7 |
+
|
8 |
def count_chinese_words(text: str):
|
9 |
return len(text)
|
10 |
|
11 |
+
|
12 |
def check_if_chinese(text: str):
|
13 |
return all('\u4e00' <= char <= '\u9fff' for char in text)
|
14 |
|
15 |
+
|
16 |
def count_words(text: str, contain_punctuation: bool = False):
|
17 |
chinese_words = []
|
18 |
english_words = []
|
|
|
33 |
temp_english_words = []
|
34 |
other_words.append(char)
|
35 |
if contain_punctuation:
|
36 |
+
return len(chinese_words) + len(english_words) + len(other_words)
|
37 |
else:
|
38 |
+
return len(chinese_words) + len(english_words)
|
39 |
|
40 |
|
41 |
CN_TEXT_1 = """
|
|
|
61 |
return False, None
|
62 |
|
63 |
|
64 |
+
register_question(
|
65 |
+
{
|
66 |
+
'cn': CN_TEXT_1,
|
67 |
+
'en': EN_TEXT_1,
|
68 |
+
},
|
69 |
+
checkers=_checker_1,
|
70 |
+
name={'cn': '1-1 初来乍到', 'en': '1-1'},
|
71 |
+
)
|
72 |
|
73 |
CN_TEXT_2 = """
|
74 |
第一章第二题(小试牛刀),请你输入三个字(及)以内的问题,使模型的回答在30个字以上。
|
|
|
96 |
return True, None
|
97 |
|
98 |
|
99 |
+
register_question(
|
100 |
+
{
|
101 |
+
'cn': CN_TEXT_2,
|
102 |
+
'en': EN_TEXT_2,
|
103 |
+
},
|
104 |
+
checkers=_checker_2,
|
105 |
+
name={'cn': '1-2 小试牛刀', 'en': '1-2'},
|
106 |
+
)
|
107 |
|
108 |
CN_TEXT_3 = """
|
109 |
第一章第三题(短说长话),请你输入一个字的问题,使模型的回答在100个字以上。
|
|
|
117 |
Please enter your query below and click the submit button
|
118 |
"""
|
119 |
|
120 |
+
|
121 |
def _checker_3(question_text: str, user_text: str, answer_text: str, lang: str):
|
122 |
_ = question_text, lang
|
123 |
answer_text = answer_text.strip()
|
|
|
130 |
else:
|
131 |
return True, None
|
132 |
|
|
|
|
|
|
|
|
|
133 |
|
134 |
+
register_question(
|
135 |
+
{
|
136 |
+
'cn': CN_TEXT_3,
|
137 |
+
'en': EN_TEXT_3,
|
138 |
+
},
|
139 |
+
checkers=_checker_3,
|
140 |
+
name={'cn': '1-3 短说长话', 'en': '1-3'}
|
141 |
+
)
|
142 |
|
143 |
CN_TEXT_4 = """
|
144 |
第一章第四题(短说短话),请输入一个字的问题,使模型的回答字数小于20个字。
|
|
|
152 |
Please enter your query below and click the submit button
|
153 |
"""
|
154 |
|
155 |
+
|
156 |
def _checker_4(question_text: str, user_text: str, answer_text: str, lang: str):
|
157 |
_ = question_text, lang
|
158 |
answer_text = answer_text.strip()
|
|
|
161 |
if count_words(user_text) > 1:
|
162 |
return False, "用户的问题长度应该在一个字及以内" if lang == 'cn' else 'Question should be one word.'
|
163 |
elif count_words(answer_text) >= 20:
|
164 |
+
return False, "大语言模型的答案应该小于20个字" if lang == 'cn' else 'Answer should be less than 20 words.'
|
165 |
else:
|
166 |
return True, None
|
167 |
|
|
|
|
|
|
|
|
|
168 |
|
169 |
+
register_question(
|
170 |
+
{
|
171 |
+
'cn': CN_TEXT_4,
|
172 |
+
'en': EN_TEXT_4,
|
173 |
+
},
|
174 |
+
checkers=_checker_4,
|
175 |
+
name={'cn': '1-4 短说短话', 'en': '1-4'},
|
176 |
+
)
|
177 |
|
178 |
# CN_TEXT_5 = """
|
179 |
# 第一章第五题(回文不变),请输入一个本身不是回文串的问题,使无论正着问还是倒着问,模型的回答是一样的。
|
llmriddles/questions/level2.py
CHANGED
@@ -54,13 +54,18 @@ def _en_checker_1(question_text: str, user_text: str, answer_text: str) -> Tuple
|
|
54 |
return True, None
|
55 |
|
56 |
|
57 |
-
register_question(
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
|
|
|
|
|
|
|
|
|
|
64 |
|
65 |
CN_TEXT_2 = """
|
66 |
第二章第二题(越说越大),请输入一个大于一的正整数作为问题,使回答里包含至少比它大一千的数。
|
@@ -102,13 +107,18 @@ def _en_checker_2(question_text: str, user_text: str, answer_text: str) -> Tuple
|
|
102 |
return False, f'Did not detect a number of at least {value + 1000} in the output'
|
103 |
|
104 |
|
105 |
-
register_question(
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
|
110 |
-
|
111 |
-
|
|
|
|
|
|
|
|
|
|
|
112 |
|
113 |
CN_TEXT_3 = """
|
114 |
第二章第三题(越说越小),请输入一个大于一的正整数作为问题,使回答里包含至少10个不同的至少比它小一千的自然数。
|
@@ -161,13 +171,18 @@ def _en_checker_3(question_text: str, user_text: str, answer_text: str) -> Tuple
|
|
161 |
f'but it doesn\'t reach 10: {collected_values!r}'
|
162 |
|
163 |
|
164 |
-
register_question(
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
|
|
|
|
|
|
|
|
|
|
171 |
|
172 |
CN_TEXT_4 = """
|
173 |
第二章第四题(警世名言),请在不提及1,4,5这三个字符的情况下让模型回答114514,不要有多余的字符。
|
@@ -191,13 +206,18 @@ def _en_checker_4(question_text: str, user_text: str, answer_text: str) -> Tuple
|
|
191 |
return answer_text.strip() == '114514', None
|
192 |
|
193 |
|
194 |
-
register_question(
|
195 |
-
|
196 |
-
|
197 |
-
|
198 |
-
|
199 |
-
|
200 |
-
|
|
|
|
|
|
|
|
|
|
|
201 |
|
202 |
CN_TEXT_5 = """
|
203 |
第二章第五题(开平方数),请输入一个大于一的平方数作为问题,使回答里包含它的平方根对应的数。
|
@@ -245,13 +265,18 @@ def _en_checker_5(question_text: str, user_text: str, answer_text: str) -> Tuple
|
|
245 |
return False, f'Square root {sq} not detected'
|
246 |
|
247 |
|
248 |
-
register_question(
|
249 |
-
|
250 |
-
|
251 |
-
|
252 |
-
|
253 |
-
|
254 |
-
|
|
|
|
|
|
|
|
|
|
|
255 |
|
256 |
CN_TEXT_6 = """
|
257 |
第二章第六题(得寸进狗),请提一个不包含“狗”这个字的10个字以内的问题,使回答中“狗”这个字出现的次数至少是问题字数的2倍。
|
@@ -290,10 +315,15 @@ def _en_checker_6(question_text: str, user_text: str, answer_text: str) -> Tuple
|
|
290 |
f'which is less than {len(q_words) * 2} times.'
|
291 |
|
292 |
|
293 |
-
register_question(
|
294 |
-
|
295 |
-
|
296 |
-
|
297 |
-
|
298 |
-
|
299 |
-
|
|
|
|
|
|
|
|
|
|
|
|
54 |
return True, None
|
55 |
|
56 |
|
57 |
+
register_question(
|
58 |
+
{
|
59 |
+
'cn': CN_TEXT_1,
|
60 |
+
'en': EN_TEXT_1,
|
61 |
+
},
|
62 |
+
checkers={
|
63 |
+
'cn': _cn_checker_1,
|
64 |
+
'en': _en_checker_1,
|
65 |
+
},
|
66 |
+
name={'cn': '2-1 质数长度', 'en': '2-1'},
|
67 |
+
level=2
|
68 |
+
)
|
69 |
|
70 |
CN_TEXT_2 = """
|
71 |
第二章第二题(越说越大),请输入一个大于一的正整数作为问题,使回答里包含至少比它大一千的数。
|
|
|
107 |
return False, f'Did not detect a number of at least {value + 1000} in the output'
|
108 |
|
109 |
|
110 |
+
register_question(
|
111 |
+
{
|
112 |
+
'cn': CN_TEXT_2,
|
113 |
+
'en': EN_TEXT_2,
|
114 |
+
},
|
115 |
+
checkers={
|
116 |
+
'cn': _cn_checker_2,
|
117 |
+
'en': _en_checker_2,
|
118 |
+
},
|
119 |
+
name={'cn': '2-2 越说越大', 'en': '2-2'},
|
120 |
+
level=2
|
121 |
+
)
|
122 |
|
123 |
CN_TEXT_3 = """
|
124 |
第二章第三题(越说越小),请输入一个大于一的正整数作为问题,使回答里包含至少10个不同的至少比它小一千的自然数。
|
|
|
171 |
f'but it doesn\'t reach 10: {collected_values!r}'
|
172 |
|
173 |
|
174 |
+
register_question(
|
175 |
+
{
|
176 |
+
'cn': CN_TEXT_3,
|
177 |
+
'en': EN_TEXT_3,
|
178 |
+
},
|
179 |
+
checkers={
|
180 |
+
'cn': _cn_checker_3,
|
181 |
+
'en': _en_checker_3,
|
182 |
+
},
|
183 |
+
name={'cn': '2-3 越说越小', 'en': '2-3'},
|
184 |
+
level=2,
|
185 |
+
)
|
186 |
|
187 |
CN_TEXT_4 = """
|
188 |
第二章第四题(警世名言),请在不提及1,4,5这三个字符的情况下让模型回答114514,不要有多余的字符。
|
|
|
206 |
return answer_text.strip() == '114514', None
|
207 |
|
208 |
|
209 |
+
register_question(
|
210 |
+
{
|
211 |
+
'cn': CN_TEXT_4,
|
212 |
+
'en': EN_TEXT_4,
|
213 |
+
},
|
214 |
+
checkers={
|
215 |
+
'cn': _cn_checker_4,
|
216 |
+
'en': _en_checker_4,
|
217 |
+
},
|
218 |
+
name={'cn': '2-4 警世名言', 'en': '2-4'},
|
219 |
+
level=2,
|
220 |
+
)
|
221 |
|
222 |
CN_TEXT_5 = """
|
223 |
第二章第五题(开平方数),请输入一个大于一的平方数作为问题,使回答里包含它的平方根对应的数。
|
|
|
265 |
return False, f'Square root {sq} not detected'
|
266 |
|
267 |
|
268 |
+
register_question(
|
269 |
+
{
|
270 |
+
'cn': CN_TEXT_5,
|
271 |
+
'en': EN_TEXT_5,
|
272 |
+
},
|
273 |
+
checkers={
|
274 |
+
'cn': _cn_checker_5,
|
275 |
+
'en': _en_checker_5,
|
276 |
+
},
|
277 |
+
name={'cn': '2-5 开平方数', 'en': '2-5'},
|
278 |
+
level=2,
|
279 |
+
)
|
280 |
|
281 |
CN_TEXT_6 = """
|
282 |
第二章第六题(得寸进狗),请提一个不包含“狗”这个字的10个字以内的问题,使回答中“狗”这个字出现的次数至少是问题字数的2倍。
|
|
|
315 |
f'which is less than {len(q_words) * 2} times.'
|
316 |
|
317 |
|
318 |
+
register_question(
|
319 |
+
{
|
320 |
+
'cn': CN_TEXT_6,
|
321 |
+
'en': EN_TEXT_6,
|
322 |
+
},
|
323 |
+
checkers={
|
324 |
+
'cn': _cn_checker_6,
|
325 |
+
'en': _en_checker_6,
|
326 |
+
},
|
327 |
+
name={'cn': '2-6 得寸进狗', 'en': '2-6'},
|
328 |
+
level=2
|
329 |
+
)
|
llmriddles/questions/level3.py
CHANGED
@@ -1,5 +1,6 @@
|
|
1 |
from .question import register_question
|
2 |
|
|
|
3 |
def count_words(text: str, contain_punctuation: bool = False):
|
4 |
chinese_words = []
|
5 |
english_words = []
|
@@ -20,9 +21,10 @@ def count_words(text: str, contain_punctuation: bool = False):
|
|
20 |
temp_english_words = []
|
21 |
other_words.append(char)
|
22 |
if contain_punctuation:
|
23 |
-
return len(chinese_words)+len(english_words)+len(other_words)
|
24 |
else:
|
25 |
-
return len(chinese_words)+len(english_words)
|
|
|
26 |
|
27 |
CN_TEXT_1 = """
|
28 |
第三章第一题(自说自话),请你输入一个至少10个字的提问,使模型的回答和提问完全一样。
|
@@ -36,23 +38,30 @@ For the first question in chapter 3, please enter a question of at least ten wor
|
|
36 |
Please enter your query below and click the submit button
|
37 |
"""
|
38 |
|
|
|
39 |
def _checker_1(question_text: str, user_text: str, answer_text: str, lang: str):
|
40 |
_ = question_text, lang
|
41 |
answer_text = answer_text.strip()
|
42 |
user_text = user_text.strip()
|
43 |
|
44 |
if count_words(user_text) < 10:
|
45 |
-
return False, "用户的问题长度应该至少10个字" if lang=='cn' else 'Question should be no less than 10 words.'
|
46 |
|
47 |
if answer_text == user_text:
|
48 |
return True, None
|
49 |
else:
|
50 |
return False, None
|
51 |
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
CN_TEXT_2 = """
|
58 |
第三章第二题(话自说自),请你输入一个至少10个字的提问,使模型的回答是问题的反序。
|
@@ -66,15 +75,16 @@ For the second question in chapter 3, please enter a question of at least ten wo
|
|
66 |
Please enter your query below and click the submit button
|
67 |
"""
|
68 |
|
|
|
69 |
def _checker_2(question_text: str, user_text: str, answer_text: str, lang: str):
|
70 |
_ = question_text, lang
|
71 |
answer_text = answer_text.strip()
|
72 |
user_text = user_text.strip()
|
73 |
|
74 |
if count_words(user_text) < 10:
|
75 |
-
return False, "用户的问题长度应该至少10个字" if lang=='cn' else 'Question should be no less than 10 words.'
|
76 |
|
77 |
-
if lang=='cn':
|
78 |
user_text = user_text[::-1]
|
79 |
else:
|
80 |
words = user_text.split()
|
@@ -86,10 +96,16 @@ def _checker_2(question_text: str, user_text: str, answer_text: str, lang: str):
|
|
86 |
else:
|
87 |
return False, None
|
88 |
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
|
94 |
CN_TEXT_3 = """
|
95 |
第三章第三题(就是敢说),请你输入一个不包含“1”和“3”的提问,使模型的回答是一字不差的“1+1=3”(不需要引号)。
|
@@ -103,23 +119,30 @@ For the third question in chapter 3, please enter a question not contain "1" or
|
|
103 |
Please enter your query below and click the submit button
|
104 |
"""
|
105 |
|
|
|
106 |
def _checker_3(question_text: str, user_text: str, answer_text: str, lang: str):
|
107 |
_ = question_text, lang
|
108 |
answer_text = answer_text.strip()
|
109 |
user_text = user_text.strip()
|
110 |
|
111 |
if '1' in user_text or '3' in user_text:
|
112 |
-
return False, "用户的问题不能包含“1”或“3”" if lang=='cn' else 'Question should not contain \"1\" or \"3\".'
|
113 |
|
114 |
if answer_text == '1+1=3':
|
115 |
return True, None
|
116 |
else:
|
117 |
return False, None
|
118 |
|
119 |
-
|
120 |
-
|
121 |
-
|
122 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
123 |
|
124 |
# CN_TEXT_4 = """
|
125 |
# 第三章第四题(回文协变),请你输入一个本身不是回文串的问题,使得正着问和倒着问时,模型的回答本身不是回文且也是逆序。
|
|
|
1 |
from .question import register_question
|
2 |
|
3 |
+
|
4 |
def count_words(text: str, contain_punctuation: bool = False):
|
5 |
chinese_words = []
|
6 |
english_words = []
|
|
|
21 |
temp_english_words = []
|
22 |
other_words.append(char)
|
23 |
if contain_punctuation:
|
24 |
+
return len(chinese_words) + len(english_words) + len(other_words)
|
25 |
else:
|
26 |
+
return len(chinese_words) + len(english_words)
|
27 |
+
|
28 |
|
29 |
CN_TEXT_1 = """
|
30 |
第三章第一题(自说自话),请你输入一个至少10个字的提问,使模型的回答和提问完全一样。
|
|
|
38 |
Please enter your query below and click the submit button
|
39 |
"""
|
40 |
|
41 |
+
|
42 |
def _checker_1(question_text: str, user_text: str, answer_text: str, lang: str):
|
43 |
_ = question_text, lang
|
44 |
answer_text = answer_text.strip()
|
45 |
user_text = user_text.strip()
|
46 |
|
47 |
if count_words(user_text) < 10:
|
48 |
+
return False, "用户的问题长度应该至少10个字" if lang == 'cn' else 'Question should be no less than 10 words.'
|
49 |
|
50 |
if answer_text == user_text:
|
51 |
return True, None
|
52 |
else:
|
53 |
return False, None
|
54 |
|
55 |
+
|
56 |
+
register_question(
|
57 |
+
{
|
58 |
+
'cn': CN_TEXT_1,
|
59 |
+
'en': EN_TEXT_1,
|
60 |
+
},
|
61 |
+
checkers=_checker_1,
|
62 |
+
name={'cn': '3-1 自说自话', 'en': '3-1'},
|
63 |
+
level=3,
|
64 |
+
)
|
65 |
|
66 |
CN_TEXT_2 = """
|
67 |
第三章第二题(话自说自),请你输入一个至少10个字的提问,使模型的回答是问题的反序。
|
|
|
75 |
Please enter your query below and click the submit button
|
76 |
"""
|
77 |
|
78 |
+
|
79 |
def _checker_2(question_text: str, user_text: str, answer_text: str, lang: str):
|
80 |
_ = question_text, lang
|
81 |
answer_text = answer_text.strip()
|
82 |
user_text = user_text.strip()
|
83 |
|
84 |
if count_words(user_text) < 10:
|
85 |
+
return False, "用户的问题长度应该至少10个字" if lang == 'cn' else 'Question should be no less than 10 words.'
|
86 |
|
87 |
+
if lang == 'cn':
|
88 |
user_text = user_text[::-1]
|
89 |
else:
|
90 |
words = user_text.split()
|
|
|
96 |
else:
|
97 |
return False, None
|
98 |
|
99 |
+
|
100 |
+
register_question(
|
101 |
+
{
|
102 |
+
'cn': CN_TEXT_2,
|
103 |
+
'en': EN_TEXT_2,
|
104 |
+
},
|
105 |
+
checkers=_checker_2,
|
106 |
+
name={'cn': '3-2 话自说自', 'en': '3-2'},
|
107 |
+
level=3,
|
108 |
+
)
|
109 |
|
110 |
CN_TEXT_3 = """
|
111 |
第三章第三题(就是敢说),请你输入一个不包含“1”和“3”的提问,使模型的回答是一字不差的“1+1=3”(不需要引号)。
|
|
|
119 |
Please enter your query below and click the submit button
|
120 |
"""
|
121 |
|
122 |
+
|
123 |
def _checker_3(question_text: str, user_text: str, answer_text: str, lang: str):
|
124 |
_ = question_text, lang
|
125 |
answer_text = answer_text.strip()
|
126 |
user_text = user_text.strip()
|
127 |
|
128 |
if '1' in user_text or '3' in user_text:
|
129 |
+
return False, "用户的问题不能包含“1”或“3”" if lang == 'cn' else 'Question should not contain \"1\" or \"3\".'
|
130 |
|
131 |
if answer_text == '1+1=3':
|
132 |
return True, None
|
133 |
else:
|
134 |
return False, None
|
135 |
|
136 |
+
|
137 |
+
register_question(
|
138 |
+
{
|
139 |
+
'cn': CN_TEXT_3,
|
140 |
+
'en': EN_TEXT_3,
|
141 |
+
},
|
142 |
+
checkers=_checker_3,
|
143 |
+
name={'cn': '3-3 就是敢说', 'en': '3-3'},
|
144 |
+
level=3,
|
145 |
+
)
|
146 |
|
147 |
# CN_TEXT_4 = """
|
148 |
# 第三章第四题(回文协变),请你输入一个本身不是回文串的问题,使得正着问和倒着问时,模型的回答本身不是回文且也是逆序。
|
llmriddles/questions/level4.py
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
-
from .question import register_question
|
2 |
import re
|
3 |
|
|
|
|
|
4 |
|
5 |
def check_if_is_number(text: str):
|
6 |
try:
|
@@ -85,13 +86,19 @@ def _checker_3(question_text: str, user_text: str, answer_text: str, lang: str):
|
|
85 |
return False, "问题应该是一个正整数" if lang == 'cn' else 'Question should be a positive integer.'
|
86 |
elif int(question_text) == 1:
|
87 |
return False, "问题应该是一个大于1的正整数" if lang == 'cn' else 'Question should be a positive integer greater than 1.'
|
88 |
-
elif int(question_text)-1 not in get_all_numbers_in_a_sentence(answer_text) or int(
|
|
|
89 |
return False, "回答中应该包含一个与问题相差1的数字" if lang == 'cn' else 'Answer should contain a number that is exactly 1 different from the question.'
|
90 |
else:
|
91 |
return True, None
|
92 |
|
93 |
|
94 |
-
register_question(
|
95 |
-
|
96 |
-
|
97 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
import re
|
2 |
|
3 |
+
from .question import register_question
|
4 |
+
|
5 |
|
6 |
def check_if_is_number(text: str):
|
7 |
try:
|
|
|
86 |
return False, "问题应该是一个正整数" if lang == 'cn' else 'Question should be a positive integer.'
|
87 |
elif int(question_text) == 1:
|
88 |
return False, "问题应该是一个大于1的正整数" if lang == 'cn' else 'Question should be a positive integer greater than 1.'
|
89 |
+
elif int(question_text) - 1 not in get_all_numbers_in_a_sentence(answer_text) or int(
|
90 |
+
question_text) + 1 not in get_all_numbers_in_a_sentence(answer_text):
|
91 |
return False, "回答中应该包含一个与问题相差1的数字" if lang == 'cn' else 'Answer should contain a number that is exactly 1 different from the question.'
|
92 |
else:
|
93 |
return True, None
|
94 |
|
95 |
|
96 |
+
register_question(
|
97 |
+
{
|
98 |
+
'cn': CN_TEXT_3,
|
99 |
+
'en': EN_TEXT_3,
|
100 |
+
},
|
101 |
+
checkers=_checker_3,
|
102 |
+
name={'cn': '4-3 自然之密', 'en': '4-3'},
|
103 |
+
level=4,
|
104 |
+
)
|
llmriddles/questions/level5.py
CHANGED
@@ -1,5 +1,6 @@
|
|
1 |
from .question import register_question
|
2 |
|
|
|
3 |
def count_words(text: str, contain_punctuation: bool = False):
|
4 |
chinese_words = []
|
5 |
english_words = []
|
@@ -20,12 +21,13 @@ def count_words(text: str, contain_punctuation: bool = False):
|
|
20 |
temp_english_words = []
|
21 |
other_words.append(char)
|
22 |
if contain_punctuation:
|
23 |
-
return len(chinese_words)+len(english_words)+len(other_words)
|
24 |
else:
|
25 |
-
return len(chinese_words)+len(english_words)
|
|
|
26 |
|
27 |
CN_TEXT_1 = """
|
28 |
-
|
29 |
|
30 |
请在下面的输入框内填写你的提问并点击按钮提交。
|
31 |
"""
|
@@ -36,21 +38,27 @@ For the first question in chapter 5, Please construct a question of no less than
|
|
36 |
Please enter your query below and click the submit button
|
37 |
"""
|
38 |
|
|
|
39 |
def _checker_1(question_text: str, user_text: str, answer_text: str, lang: str):
|
40 |
_ = question_text, lang
|
41 |
answer_text = answer_text.strip()
|
42 |
user_text = user_text.strip()
|
43 |
|
44 |
if count_words(user_text) < 10:
|
45 |
-
return False, "用户的问题长度应该至少10个字" if lang=='cn' else 'Question should be no less than 10 words.'
|
46 |
|
47 |
if answer_text == user_text:
|
48 |
return True, None
|
49 |
else:
|
50 |
return False, None
|
51 |
|
52 |
-
register_question({
|
53 |
-
'cn': CN_TEXT_1,
|
54 |
-
'en': EN_TEXT_1,
|
55 |
-
}, _checker_1, level=5)
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
from .question import register_question
|
2 |
|
3 |
+
|
4 |
def count_words(text: str, contain_punctuation: bool = False):
|
5 |
chinese_words = []
|
6 |
english_words = []
|
|
|
21 |
temp_english_words = []
|
22 |
other_words.append(char)
|
23 |
if contain_punctuation:
|
24 |
+
return len(chinese_words) + len(english_words) + len(other_words)
|
25 |
else:
|
26 |
+
return len(chinese_words) + len(english_words)
|
27 |
+
|
28 |
|
29 |
CN_TEXT_1 = """
|
30 |
+
第五章第一题(口是心非),请构造一个不少于十个字的问题,使得回答中不包含问题中的任意字符。
|
31 |
|
32 |
请在下面的输入框内填写你的提问并点击按钮提交。
|
33 |
"""
|
|
|
38 |
Please enter your query below and click the submit button
|
39 |
"""
|
40 |
|
41 |
+
|
42 |
def _checker_1(question_text: str, user_text: str, answer_text: str, lang: str):
|
43 |
_ = question_text, lang
|
44 |
answer_text = answer_text.strip()
|
45 |
user_text = user_text.strip()
|
46 |
|
47 |
if count_words(user_text) < 10:
|
48 |
+
return False, "用户的问题长度应该至少10个字" if lang == 'cn' else 'Question should be no less than 10 words.'
|
49 |
|
50 |
if answer_text == user_text:
|
51 |
return True, None
|
52 |
else:
|
53 |
return False, None
|
54 |
|
|
|
|
|
|
|
|
|
55 |
|
56 |
+
register_question(
|
57 |
+
{
|
58 |
+
'cn': CN_TEXT_1,
|
59 |
+
'en': EN_TEXT_1,
|
60 |
+
},
|
61 |
+
checkers=_checker_1,
|
62 |
+
name={'cn': '5-1 口是心非', 'en': '5-1'},
|
63 |
+
level=5,
|
64 |
+
)
|
llmriddles/questions/question.py
CHANGED
@@ -3,14 +3,15 @@ from dataclasses import dataclass
|
|
3 |
from typing import Union, Mapping, Literal, Callable, Tuple, List, Optional
|
4 |
|
5 |
LangTyping = Literal['en', 'cn']
|
6 |
-
MultiLangCheckerTyping = Callable[[str, str, str], Tuple[bool, Optional[str]]]
|
7 |
-
SingleLangCheckerTyping = Callable[[str, str], Tuple[bool, Optional[str]]]
|
8 |
|
9 |
|
10 |
@dataclass
|
11 |
class Question:
|
12 |
texts: Mapping[str, str]
|
13 |
checker: MultiLangCheckerTyping
|
|
|
14 |
level: int
|
15 |
|
16 |
|
@@ -19,6 +20,7 @@ _KNOWN_PROBLEMS = []
|
|
19 |
|
20 |
def register_question(text: Union[Mapping[str, str], str],
|
21 |
checkers: Union[Mapping[str, SingleLangCheckerTyping], MultiLangCheckerTyping],
|
|
|
22 |
level: int = 1, default_lang='cn'):
|
23 |
if isinstance(checkers, collections.abc.Mapping):
|
24 |
_origin_checkers = checkers
|
@@ -35,7 +37,12 @@ def register_question(text: Union[Mapping[str, str], str],
|
|
35 |
else:
|
36 |
texts = text
|
37 |
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
|
41 |
def list_ordered_questions() -> List[Question]:
|
|
|
3 |
from typing import Union, Mapping, Literal, Callable, Tuple, List, Optional
|
4 |
|
5 |
LangTyping = Literal['en', 'cn']
|
6 |
+
MultiLangCheckerTyping = Callable[[str, str, str, str], Tuple[bool, Optional[str]]]
|
7 |
+
SingleLangCheckerTyping = Callable[[str, str, str], Tuple[bool, Optional[str]]]
|
8 |
|
9 |
|
10 |
@dataclass
|
11 |
class Question:
|
12 |
texts: Mapping[str, str]
|
13 |
checker: MultiLangCheckerTyping
|
14 |
+
names: Mapping[str, str]
|
15 |
level: int
|
16 |
|
17 |
|
|
|
20 |
|
21 |
def register_question(text: Union[Mapping[str, str], str],
|
22 |
checkers: Union[Mapping[str, SingleLangCheckerTyping], MultiLangCheckerTyping],
|
23 |
+
name=Union[Mapping[str, str], str],
|
24 |
level: int = 1, default_lang='cn'):
|
25 |
if isinstance(checkers, collections.abc.Mapping):
|
26 |
_origin_checkers = checkers
|
|
|
37 |
else:
|
38 |
texts = text
|
39 |
|
40 |
+
if isinstance(name, str):
|
41 |
+
names = {default_lang: name}
|
42 |
+
else:
|
43 |
+
names = name
|
44 |
+
|
45 |
+
_KNOWN_PROBLEMS.append(Question(texts, checker, names, level))
|
46 |
|
47 |
|
48 |
def list_ordered_questions() -> List[Question]:
|