File size: 4,344 Bytes
bcf87a6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
---
library_name: transformers
tags: []
---
# Falcon-11B-Base-V1.1
The Falcon-11B-Base-V1 Large Language Model (LLM) is a pretrained generative text model with 11.1 billion parameters.
## Model Specifications
- Base Model (not instruct tuned)
- Flash Attention 2
- Untied LM-Head and Word Embeddings (This adds 300M parameters over the 10.8B)
- 11.1B Parameters
- Rope Theta 500,042
### Inference Model
Inference the model with `trust_remote_code=True` to use our modeling code. We show an example below with the most basic hyperparameters.
```python
import os
import sys
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
#Load Model and Tokenizer
base_model_id = "ruliadai/falcon-base-v1.1"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
attn_implementation="flash_attention_2",
)
tokenizer = AutoTokenizer.from_pretrained(
base_model_id,
padding_side="left",
device_map="auto",
)
tokenizer.pad_token = tokenizer.eos_token
#Run Inference
while True:
prompt = input("Instruction: ")
model_input = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
model.eval()
print(model.generation_config)
with torch.no_grad():
print(tokenizer.decode(
model.generate(**model_input,max_new_tokens=800, temperature=0.0, do_sample=False, repetition_penalty=1.15)[0], use_cache=True)
)
```
### How to run inference
Setup and activate your venv/or conda env
```bash
python3 -m venv env \
&& source env/bin/activate
```
Install torch:
```bash
pip3 install torch torchvision torchaudio
```
Note that you may need to install torch according to your system req/drivers (https://pytorch.org/get-started/locally/)
Install requirements:
```bash
pip3 install --upgrade --force-reinstall transformers accelerate flash-attn hf_transfer
```
Run script:
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> python3 inference.py
```
If flash-attn is broken:
```bash
pip3 uninstall flash-attn
pip3 cache purge
pip3 install flash-attn
```
## Model Evaluation
### Measured Benchmarks (by Ruliad)
| MODEL | AVERAGE | MMLU (5-s) | TQA (0-s) | ARC (25-s) | GSM8K (5-s)| HS (10-s) | WG (5-s) |
| --------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| Falcon-Base-v1.1 | 0.6440 | 0.5683 | 0.5263 | 0.6041 | 0.5542 | 0.8280 | 0.7806 |
| Llama-3-8B | 0.6300 | 0.6513 | 0.4385 | 0.5904 | 0.5034 | 0.8223 | 0.7751 |
| Mistral-7B-v0.1 | 0.6130 | 0.6233 | 0.4258 | 0.6220 | 0.3859 | 0.8332 | 0.7861 |
### Evaluation Replication
**Install Eval Harness**
To install the `lm-eval` package from the github repository, run:
```bash
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
pip install hf_transfer accelerate transformers flash_attn
```
**Benchmarking**
To evaluate our model:
Evaluating MMLU, GSM8K and WG on 5-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
--tasks mmlu,gsm8k,winogrande \
--device cuda:0 \
--num_fewshot 5 \
--batch_size 1
```
Evaluating TQA on 0-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
--tasks truthfulqa_mc2 \
--device cuda:0 \
--batch_size 1
```
Evaluating HS on 10-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
--tasks hellaswag \
--device cuda:0 \
--num_fewshot 10 \
--batch_size 1
```
Evaluating ARC on 25-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
--tasks arc_challenge \
--device cuda:0 \
--num_fewshot 25 \
--batch_size 1
```
|