falcon-11B / .ipynb_checkpoints /README-checkpoint.md
nilabhra's picture
Upload folder using huggingface_hub
bcf87a6 verified
|
raw
history blame
No virus
4.34 kB
---
library_name: transformers
tags: []
---
# Falcon-11B-Base-V1.1
The Falcon-11B-Base-V1 Large Language Model (LLM) is a pretrained generative text model with 11.1 billion parameters.
## Model Specifications
- Base Model (not instruct tuned)
- Flash Attention 2
- Untied LM-Head and Word Embeddings (This adds 300M parameters over the 10.8B)
- 11.1B Parameters
- Rope Theta 500,042
### Inference Model
Inference the model with `trust_remote_code=True` to use our modeling code. We show an example below with the most basic hyperparameters.
```python
import os
import sys
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
#Load Model and Tokenizer
base_model_id = "ruliadai/falcon-base-v1.1"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
attn_implementation="flash_attention_2",
)
tokenizer = AutoTokenizer.from_pretrained(
base_model_id,
padding_side="left",
device_map="auto",
)
tokenizer.pad_token = tokenizer.eos_token
#Run Inference
while True:
prompt = input("Instruction: ")
model_input = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
model.eval()
print(model.generation_config)
with torch.no_grad():
print(tokenizer.decode(
model.generate(**model_input,max_new_tokens=800, temperature=0.0, do_sample=False, repetition_penalty=1.15)[0], use_cache=True)
)
```
### How to run inference
Setup and activate your venv/or conda env
```bash
python3 -m venv env \
&& source env/bin/activate
```
Install torch:
```bash
pip3 install torch torchvision torchaudio
```
Note that you may need to install torch according to your system req/drivers (https://pytorch.org/get-started/locally/)
Install requirements:
```bash
pip3 install --upgrade --force-reinstall transformers accelerate flash-attn hf_transfer
```
Run script:
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> python3 inference.py
```
If flash-attn is broken:
```bash
pip3 uninstall flash-attn
pip3 cache purge
pip3 install flash-attn
```
## Model Evaluation
### Measured Benchmarks (by Ruliad)
| MODEL | AVERAGE | MMLU (5-s) | TQA (0-s) | ARC (25-s) | GSM8K (5-s)| HS (10-s) | WG (5-s) |
| --------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| Falcon-Base-v1.1 | 0.6440 | 0.5683 | 0.5263 | 0.6041 | 0.5542 | 0.8280 | 0.7806 |
| Llama-3-8B | 0.6300 | 0.6513 | 0.4385 | 0.5904 | 0.5034 | 0.8223 | 0.7751 |
| Mistral-7B-v0.1 | 0.6130 | 0.6233 | 0.4258 | 0.6220 | 0.3859 | 0.8332 | 0.7861 |
### Evaluation Replication
**Install Eval Harness**
To install the `lm-eval` package from the github repository, run:
```bash
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
pip install hf_transfer accelerate transformers flash_attn
```
**Benchmarking**
To evaluate our model:
Evaluating MMLU, GSM8K and WG on 5-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
--tasks mmlu,gsm8k,winogrande \
--device cuda:0 \
--num_fewshot 5 \
--batch_size 1
```
Evaluating TQA on 0-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
--tasks truthfulqa_mc2 \
--device cuda:0 \
--batch_size 1
```
Evaluating HS on 10-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
--tasks hellaswag \
--device cuda:0 \
--num_fewshot 10 \
--batch_size 1
```
Evaluating ARC on 25-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
--tasks arc_challenge \
--device cuda:0 \
--num_fewshot 25 \
--batch_size 1
```