File size: 5,273 Bytes
fe41e37
 
 
03d9181
fe41e37
 
03d9181
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fe41e37
8b79b8c
fe41e37
9507dff
fe41e37
 
9507dff
 
6235368
7760f7e
6235368
 
 
 
 
 
 
 
fe41e37
adb46a6
fe41e37
9507dff
fe41e37
 
9507dff
fe41e37
9507dff
fe41e37
03d9181
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
language:
- en
license: apache-2.0
datasets:
- appvoid/no-prompt-15k
pipeline_tag: text-generation
model-index:
- name: palmer-002
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 34.47
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 59.41
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 25.94
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 37.06
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 62.67
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 1.21
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
      name: Open LLM Leaderboard
---
![palmer](https://huggingface.co/appvoid/palmer-001/resolve/main/new-logo.jpg)
# palmer
### a better base model 
palmer is a series of ~1b parameters language models fine-tuned to be used as base models instead of using custom prompts for tasks. This means that it can be further fine-tuned on more data with custom prompts as usual or be used for downstream tasks as any base model you can get. The model has the best of both worlds: some "bias" to act as an assistant, but also the abillity to predict the next-word from its internet knowledge base. It's a 1.1b llama 2 model so you can use it with your favorite tools/frameworks.

### evaluation 🧪
note that this is a zero-shot setting as opposite to open llm leaderboard's few-shot evals
```
   Model           ARC_C   HellaSwag  PIQA  Winogrande Average
tinyllama-2      | 0.2807 | 0.5463 | 0.7067 | 0.5683 | 0.5255 |
palmer-001	     | 0.2807 | 0.5524 | 0.7106 | 0.5896 | 0.5333 |
babbage-001      | 0.2944 | 0.5448 | 0.7410 | 0.5935 | 0.5434 |
deacon-1b        | 0.2944 | 0.5727 | 0.7040 | 0.5801 | 0.5434 |
tinyllama-2.5    | 0.3191 | 0.5896 | 0.7307 | 0.5872 | 0.5566 |
palmer-002       | 0.3242 | 0.5956 | 0.7345 | 0.5888 | 0.5607 |
babbage-002      | 0.3285 | 0.6380 | 0.7606 | 0.6085 | 0.5839 |
```

This model shows exceptional performance and as of now is the best tinyllama-size base model. Furthermore, this proves LIMA paper point and serves as a good open-source alternative to openai's `babbage-002`.

### training 🦾
Training took ~3.5 P100 gpu hours. It was trained on 15,000 gpt-4 shuffled samples. palmer was fine-tuned using lower learning rates ensuring it keeps as much general knowledge as possible.

### prompt 📝
```
no prompt 🚀
```
<a href="https://ko-fi.com/appvoid" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 48px !important;width: 180px !important; filter: invert(70%);" ></a>
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_appvoid__palmer-002)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |36.79|
|AI2 Reasoning Challenge (25-Shot)|34.47|
|HellaSwag (10-Shot)              |59.41|
|MMLU (5-Shot)                    |25.94|
|TruthfulQA (0-shot)              |37.06|
|Winogrande (5-shot)              |62.67|
|GSM8k (5-shot)                   | 1.21|