metadata

library_name: transformers
tags: []

Falcon-11B-Base-V1.1

The Falcon-11B-Base-V1 Large Language Model (LLM) is a pretrained generative text model with 11.1 billion parameters.

Model Specifications

Base Model (not instruct tuned)
Flash Attention 2
Untied LM-Head and Word Embeddings (This adds 300M parameters over the 10.8B)
11.1B Parameters
Rope Theta 500,042

Inference Model

Inference the model with trust_remote_code=True to use our modeling code. We show an example below with the most basic hyperparameters.

import os
import sys
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

#Load Model and Tokenizer
base_model_id = "ruliadai/falcon-base-v1.1"

model = AutoModelForCausalLM.from_pretrained(
    base_model_id, 
    device_map="auto",
    torch_dtype=torch.bfloat16, 
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
    )

tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    padding_side="left",
    device_map="auto",
    )
tokenizer.pad_token = tokenizer.eos_token

#Run Inference
while True:
    prompt = input("Instruction: ")
    model_input = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
    model.eval()
    print(model.generation_config)
    with torch.no_grad():
        print(tokenizer.decode(
            model.generate(**model_input,max_new_tokens=800, temperature=0.0, do_sample=False, repetition_penalty=1.15)[0], use_cache=True)
            )

How to run inference

Setup and activate your venv/or conda env

python3 -m venv env \
  && source env/bin/activate

Install torch:

pip3 install torch torchvision torchaudio

Note that you may need to install torch according to your system req/drivers (https://pytorch.org/get-started/locally/)

Install requirements:

pip3 install --upgrade --force-reinstall transformers accelerate flash-attn hf_transfer

Run script:


HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> python3 inference.py

If flash-attn is broken:

pip3 uninstall flash-attn
pip3 cache purge
pip3 install flash-attn

Model Evaluation

Measured Benchmarks (by Ruliad)

MODEL	AVERAGE	MMLU (5-s)	TQA (0-s)	ARC (25-s)	GSM8K (5-s)	HS (10-s)	WG (5-s)
Falcon-Base-v1.1	0.6440	0.5683	0.5263	0.6041	0.5542	0.8280	0.7806
Llama-3-8B	0.6300	0.6513	0.4385	0.5904	0.5034	0.8223	0.7751
Mistral-7B-v0.1	0.6130	0.6233	0.4258	0.6220	0.3859	0.8332	0.7861

Evaluation Replication

Install Eval Harness

To install the lm-eval package from the github repository, run:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
pip install hf_transfer accelerate transformers flash_attn

Benchmarking

To evaluate our model:

Evaluating MMLU, GSM8K and WG on 5-Shot

HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks mmlu,gsm8k,winogrande \
    --device cuda:0 \
    --num_fewshot 5 \
    --batch_size 1

Evaluating TQA on 0-Shot

HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks truthfulqa_mc2 \
    --device cuda:0 \
    --batch_size 1

Evaluating HS on 10-Shot

HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks hellaswag \
    --device cuda:0 \
    --num_fewshot 10 \
    --batch_size 1

Evaluating ARC on 25-Shot

HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks arc_challenge \
    --device cuda:0 \
    --num_fewshot 25 \
    --batch_size 1