File size: 4,344 Bytes

bcf87a6

---
library_name: transformers
tags: []
---

# Falcon-11B-Base-V1.1
The Falcon-11B-Base-V1 Large Language Model (LLM) is a pretrained generative text model with 11.1 billion parameters. 

## Model Specifications
- Base Model (not instruct tuned)
- Flash Attention 2
- Untied LM-Head and Word Embeddings (This adds 300M parameters over the 10.8B)
- 11.1B Parameters
- Rope Theta 500,042



### Inference Model
Inference the model with `trust_remote_code=True` to use our modeling code. We show an example below with the most basic hyperparameters.

```python
import os
import sys
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

#Load Model and Tokenizer
base_model_id = "ruliadai/falcon-base-v1.1"

model = AutoModelForCausalLM.from_pretrained(
    base_model_id, 
    device_map="auto",
    torch_dtype=torch.bfloat16, 
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
    )

tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    padding_side="left",
    device_map="auto",
    )
tokenizer.pad_token = tokenizer.eos_token

#Run Inference
while True:
    prompt = input("Instruction: ")
    model_input = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
    model.eval()
    print(model.generation_config)
    with torch.no_grad():
        print(tokenizer.decode(
            model.generate(**model_input,max_new_tokens=800, temperature=0.0, do_sample=False, repetition_penalty=1.15)[0], use_cache=True)
            )
```

### How to run inference

Setup and activate your venv/or conda env

```bash
python3 -m venv env \
  && source env/bin/activate
```

Install torch:
```bash
pip3 install torch torchvision torchaudio
```
Note that you may need to install torch according to your system req/drivers (https://pytorch.org/get-started/locally/)


Install requirements:
```bash
pip3 install --upgrade --force-reinstall transformers accelerate flash-attn hf_transfer
```

Run script:

```bash

HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> python3 inference.py
```


If flash-attn is broken:
```bash
pip3 uninstall flash-attn
pip3 cache purge
pip3 install flash-attn
```


## Model Evaluation

### Measured Benchmarks (by Ruliad)

| MODEL           | AVERAGE    | MMLU (5-s) | TQA (0-s)  | ARC (25-s) | GSM8K (5-s)| HS (10-s)  | WG (5-s)   |
| --------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| Falcon-Base-v1.1  | 0.6440     | 0.5683     | 0.5263     | 0.6041     | 0.5542     | 0.8280     | 0.7806     |
| Llama-3-8B      | 0.6300     | 0.6513     | 0.4385     | 0.5904     | 0.5034     | 0.8223     | 0.7751     |
| Mistral-7B-v0.1 | 0.6130     | 0.6233     | 0.4258     | 0.6220     | 0.3859     | 0.8332     | 0.7861     |

### Evaluation Replication

**Install Eval Harness**

To install the `lm-eval` package from the github repository, run:
```bash
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
pip install hf_transfer accelerate transformers flash_attn
```
**Benchmarking**

To evaluate our model:

Evaluating MMLU, GSM8K and WG on 5-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks mmlu,gsm8k,winogrande \
    --device cuda:0 \
    --num_fewshot 5 \
    --batch_size 1
```

Evaluating TQA on 0-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks truthfulqa_mc2 \
    --device cuda:0 \
    --batch_size 1
```

Evaluating HS on 10-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks hellaswag \
    --device cuda:0 \
    --num_fewshot 10 \
    --batch_size 1
```

Evaluating ARC on 25-Shot
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks arc_challenge \
    --device cuda:0 \
    --num_fewshot 25 \
    --batch_size 1
```