metadata
language:
- en
license: apache-2.0
tags:
- generated_from_trainer
datasets:
- HuggingFaceH4/no_robots
base_model: openchat/openchat_3.5
widget:
- text: >
<|system|>
You are a friendly chatbot who always responds in the style of a
pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
output:
text: >-
Ahoy there, me hearty! As a friendly pirate chatbot, I be tellin' ye
that a human cannot eat a helicopter, as it be a large machine made of
metal and suchlike, not fit for human consumption. A human can eat food,
like a fine feast of roasted meat and sweet fruits, but a helicopter?
That be nonsense, me hearty! So, the answer be none, none at all. Arr!
pipeline_tag: text-generation
model-index:
- name: smol-7b
results: []
Smol 7B
This model is a fine-tuned version of openchat/openchat_3.5 on the open source dataset HuggingFaceH4/no_robots using the recipes published in The Alignment Handbook.
Model date
rishiraj/smol-7b was trained between 1st and 3rd December, 2023.
Evaluation
It achieves the following results on the Open_LLM_Leaderboard. At the time of release, smol-7b is the highest ranked 7B chat model on the MMLU Benchmark.
Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
---|---|---|---|---|---|---|---|
rishiraj/smol-7b | 67.11 | 63.74 | 84.77 | 65 | 46.17 | 80.66 | 62.32 |
argilla/notus-7b-v1 | 63.49 | 64.59 | 84.83 | 63.04 | 54.35 | 79.56 | 34.57 |
Intel/neural-chat-7b-v3-1 | 61.59 | 66.21 | 83.64 | 62.37 | 59.65 | 78.14 | 19.56 |
HuggingFaceH4/zephyr-7b-beta | 61.59 | 62.46 | 84.35 | 60.7 | 57.83 | 77.11 | 27.07 |
Qwen/Qwen-7B | 59.19 | 51.37 | 78.47 | 59.84 | 47.79 | 72.69 | 44.96 |
microsoft/Orca-2-7b | 54.55 | 54.1 | 76.19 | 56.37 | 52.45 | 73.48 | 14.71 |
01-ai/Yi-6B | 54.08 | 55.55 | 76.57 | 64.11 | 41.96 | 74.19 | 12.13 |
Inference procedure
Here's how you can run the model using the pipeline() function from 🤗 Transformers:
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="rishiraj/smol-7b", torch_dtype=torch.bfloat16, device_map="auto")
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate"
},
{
"role": "user",
"content": "How many helicopters can a human eat in one sitting?"
}
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 128
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.0569 | 0.16 | 3 | 2.0409 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.1+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1
Citation Information
@misc{rishiraj2023smol,
author = {Rishiraj Acharya},
title = {Smol 7B},
year = {2023},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/rishiraj/smol-7b}}
}
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 67.11 |
AI2 Reasoning Challenge (25-Shot) | 63.74 |
HellaSwag (10-Shot) | 84.77 |
MMLU (5-Shot) | 65.00 |
TruthfulQA (0-shot) | 46.17 |
Winogrande (5-shot) | 80.66 |
GSM8k (5-shot) | 62.32 |