|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# Falcon-11B-Base-V1.1 |
|
The Falcon-11B-Base-V1 Large Language Model (LLM) is a pretrained generative text model with 11.1 billion parameters. |
|
|
|
## Model Specifications |
|
- Base Model (not instruct tuned) |
|
- Flash Attention 2 |
|
- Untied LM-Head and Word Embeddings (This adds 300M parameters over the 10.8B) |
|
- 11.1B Parameters |
|
- Rope Theta 500,042 |
|
|
|
|
|
|
|
### Inference Model |
|
Inference the model with `trust_remote_code=True` to use our modeling code. We show an example below with the most basic hyperparameters. |
|
|
|
```python |
|
import os |
|
import sys |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
#Load Model and Tokenizer |
|
base_model_id = "ruliadai/falcon-base-v1.1" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
base_model_id, |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, |
|
trust_remote_code=True, |
|
attn_implementation="flash_attention_2", |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
base_model_id, |
|
padding_side="left", |
|
device_map="auto", |
|
) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
#Run Inference |
|
while True: |
|
prompt = input("Instruction: ") |
|
model_input = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False) |
|
model.eval() |
|
print(model.generation_config) |
|
with torch.no_grad(): |
|
print(tokenizer.decode( |
|
model.generate(**model_input,max_new_tokens=800, temperature=0.0, do_sample=False, repetition_penalty=1.15)[0], use_cache=True) |
|
) |
|
``` |
|
|
|
### How to run inference |
|
|
|
Setup and activate your venv/or conda env |
|
|
|
```bash |
|
python3 -m venv env \ |
|
&& source env/bin/activate |
|
``` |
|
|
|
Install torch: |
|
```bash |
|
pip3 install torch torchvision torchaudio |
|
``` |
|
Note that you may need to install torch according to your system req/drivers (https://pytorch.org/get-started/locally/) |
|
|
|
|
|
Install requirements: |
|
```bash |
|
pip3 install --upgrade --force-reinstall transformers accelerate flash-attn hf_transfer |
|
``` |
|
|
|
Run script: |
|
|
|
```bash |
|
|
|
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> python3 inference.py |
|
``` |
|
|
|
|
|
If flash-attn is broken: |
|
```bash |
|
pip3 uninstall flash-attn |
|
pip3 cache purge |
|
pip3 install flash-attn |
|
``` |
|
|
|
|
|
## Model Evaluation |
|
|
|
### Measured Benchmarks (by Ruliad) |
|
|
|
| MODEL | AVERAGE | MMLU (5-s) | TQA (0-s) | ARC (25-s) | GSM8K (5-s)| HS (10-s) | WG (5-s) | |
|
| --------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | |
|
| Falcon-Base-v1.1 | 0.6440 | 0.5683 | 0.5263 | 0.6041 | 0.5542 | 0.8280 | 0.7806 | |
|
| Llama-3-8B | 0.6300 | 0.6513 | 0.4385 | 0.5904 | 0.5034 | 0.8223 | 0.7751 | |
|
| Mistral-7B-v0.1 | 0.6130 | 0.6233 | 0.4258 | 0.6220 | 0.3859 | 0.8332 | 0.7861 | |
|
|
|
### Evaluation Replication |
|
|
|
**Install Eval Harness** |
|
|
|
To install the `lm-eval` package from the github repository, run: |
|
```bash |
|
git clone https://github.com/EleutherAI/lm-evaluation-harness |
|
cd lm-evaluation-harness |
|
pip install -e . |
|
pip install hf_transfer accelerate transformers flash_attn |
|
``` |
|
**Benchmarking** |
|
|
|
To evaluate our model: |
|
|
|
Evaluating MMLU, GSM8K and WG on 5-Shot |
|
```bash |
|
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \ |
|
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \ |
|
--tasks mmlu,gsm8k,winogrande \ |
|
--device cuda:0 \ |
|
--num_fewshot 5 \ |
|
--batch_size 1 |
|
``` |
|
|
|
Evaluating TQA on 0-Shot |
|
```bash |
|
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \ |
|
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \ |
|
--tasks truthfulqa_mc2 \ |
|
--device cuda:0 \ |
|
--batch_size 1 |
|
``` |
|
|
|
Evaluating HS on 10-Shot |
|
```bash |
|
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \ |
|
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \ |
|
--tasks hellaswag \ |
|
--device cuda:0 \ |
|
--num_fewshot 10 \ |
|
--batch_size 1 |
|
``` |
|
|
|
Evaluating ARC on 25-Shot |
|
```bash |
|
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \ |
|
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \ |
|
--tasks arc_challenge \ |
|
--device cuda:0 \ |
|
--num_fewshot 25 \ |
|
--batch_size 1 |
|
``` |
|
|