kollama2-7b-v2 / README.md
psyche's picture
Adding Evaluation Results (#1)
11bff5c

Under Construction

This model has some decoding patterns that origin from the fine-tunned dataset. So I will train the model to remove these patterns.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 44.91
ARC (25-shot) 53.33
HellaSwag (10-shot) 78.5
MMLU (5-shot) 43.61
TruthfulQA (0-shot) 46.37
Winogrande (5-shot) 75.61
GSM8K (5-shot) 6.52
DROP (3-shot) 10.4