Update README.md
Browse files
README.md
CHANGED
@@ -110,7 +110,7 @@ Performance-wise:
|
|
110 |
\* Taiwan-LLM models responds to multi-turn questions (English) in Traditional Chinese.
|
111 |
|
112 |
|
113 |
-
| Details
|
114 |
|-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
|
115 |
| gpt-3.5-turbo | 7.8 | 6.1 | 5.1 | 6.4 | 6.2 | 8.7 | 7.4 | 9.3 | 7.1 |
|
116 |
| Yi-34B-Chat | 9.0 | 4.8 | 5.7 | 4.0 | 4.7 | 8.5 | 8.7 | 9.8 | 6.9 |
|
@@ -123,7 +123,7 @@ Performance-wise:
|
|
123 |
| Taiwan-LLM-7B-v2.1-chat | 5.2 | 2.6 | 2.3 | 1.2 | 3.4 | 6.6 | 5.7 | 6.8 | 4.2 |
|
124 |
|
125 |
|
126 |
-
| Details
|
127 |
|-----------------------------------------------------|--------------|----------------|------------|------------|---------|
|
128 |
| Yi-34B-Chat | 47.65 | 64.25 | 52.73 | 54.91 | 54.87 |
|
129 |
| Qwen-14B-Chat | 43.83 | 55.00 | 48.55 | 46.22 | 48.41 |
|
|
|
110 |
\* Taiwan-LLM models responds to multi-turn questions (English) in Traditional Chinese.
|
111 |
|
112 |
|
113 |
+
| Details on MT-Bench-tw (0 shot):<br/>Models | STEM |Extraction|Reasoning| Math | Coding | Roleplay| Writing |Humanities|↑ AVG |
|
114 |
|-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
|
115 |
| gpt-3.5-turbo | 7.8 | 6.1 | 5.1 | 6.4 | 6.2 | 8.7 | 7.4 | 9.3 | 7.1 |
|
116 |
| Yi-34B-Chat | 9.0 | 4.8 | 5.7 | 4.0 | 4.7 | 8.5 | 8.7 | 9.8 | 6.9 |
|
|
|
123 |
| Taiwan-LLM-7B-v2.1-chat | 5.2 | 2.6 | 2.3 | 1.2 | 3.4 | 6.6 | 5.7 | 6.8 | 4.2 |
|
124 |
|
125 |
|
126 |
+
| Details on TMMLU+ (0 shot):<br/>Model | STEM | Social Science | Humanities | Other | ↑ AVG |
|
127 |
|-----------------------------------------------------|--------------|----------------|------------|------------|---------|
|
128 |
| Yi-34B-Chat | 47.65 | 64.25 | 52.73 | 54.91 | 54.87 |
|
129 |
| Qwen-14B-Chat | 43.83 | 55.00 | 48.55 | 46.22 | 48.41 |
|