palmer-002 / README.md

Adding Evaluation Results

03d9181 verified 9 months ago

5.27 kB

	---
	language:
	- en
	license: apache-2.0
	datasets:
	- appvoid/no-prompt-15k
	pipeline_tag: text-generation
	model-index:
	- name: palmer-002
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 34.47
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 59.41
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 25.94
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 37.06
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 62.67
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 1.21
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=appvoid/palmer-002
	name: Open LLM Leaderboard
	---
	![palmer](https://huggingface.co/appvoid/palmer-001/resolve/main/new-logo.jpg)
	# palmer
	### a better base model
	palmer is a series of ~1b parameters language models fine-tuned to be used as base models instead of using custom prompts for tasks. This means that it can be further fine-tuned on more data with custom prompts as usual or be used for downstream tasks as any base model you can get. The model has the best of both worlds: some "bias" to act as an assistant, but also the abillity to predict the next-word from its internet knowledge base. It's a 1.1b llama 2 model so you can use it with your favorite tools/frameworks.

	### evaluation 🧪
	note that this is a zero-shot setting as opposite to open llm leaderboard's few-shot evals
	```
	Model ARC_C HellaSwag PIQA Winogrande Average
	tinyllama-2 \| 0.2807 \| 0.5463 \| 0.7067 \| 0.5683 \| 0.5255 \|
	palmer-001 \| 0.2807 \| 0.5524 \| 0.7106 \| 0.5896 \| 0.5333 \|
	babbage-001 \| 0.2944 \| 0.5448 \| 0.7410 \| 0.5935 \| 0.5434 \|
	deacon-1b \| 0.2944 \| 0.5727 \| 0.7040 \| 0.5801 \| 0.5434 \|
	tinyllama-2.5 \| 0.3191 \| 0.5896 \| 0.7307 \| 0.5872 \| 0.5566 \|
	palmer-002 \| 0.3242 \| 0.5956 \| 0.7345 \| 0.5888 \| 0.5607 \|
	babbage-002 \| 0.3285 \| 0.6380 \| 0.7606 \| 0.6085 \| 0.5839 \|
	```

	This model shows exceptional performance and as of now is the best tinyllama-size base model. Furthermore, this proves LIMA paper point and serves as a good open-source alternative to openai's `babbage-002`.

	### training 🦾
	Training took ~3.5 P100 gpu hours. It was trained on 15,000 gpt-4 shuffled samples. palmer was fine-tuned using lower learning rates ensuring it keeps as much general knowledge as possible.

	### prompt 📝
	```
	no prompt 🚀
	```
	<a href="https://ko-fi.com/appvoid" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 48px !important;width: 180px !important; filter: invert(70%);" ></a>
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_appvoid__palmer-002)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|36.79\|
	\|AI2 Reasoning Challenge (25-Shot)\|34.47\|
	\|HellaSwag (10-Shot) \|59.41\|
	\|MMLU (5-Shot) \|25.94\|
	\|TruthfulQA (0-shot) \|37.06\|
	\|Winogrande (5-shot) \|62.67\|
	\|GSM8k (5-shot) \| 1.21\|