falcon-11B / .ipynb_checkpoints /README-checkpoint.md
nilabhra's picture
Upload folder using huggingface_hub
bcf87a6 verified
|
raw
history blame
No virus
4.34 kB
metadata
library_name: transformers
tags: []

Falcon-11B-Base-V1.1

The Falcon-11B-Base-V1 Large Language Model (LLM) is a pretrained generative text model with 11.1 billion parameters.

Model Specifications

  • Base Model (not instruct tuned)
  • Flash Attention 2
  • Untied LM-Head and Word Embeddings (This adds 300M parameters over the 10.8B)
  • 11.1B Parameters
  • Rope Theta 500,042

Inference Model

Inference the model with trust_remote_code=True to use our modeling code. We show an example below with the most basic hyperparameters.

import os
import sys
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

#Load Model and Tokenizer
base_model_id = "ruliadai/falcon-base-v1.1"

model = AutoModelForCausalLM.from_pretrained(
    base_model_id, 
    device_map="auto",
    torch_dtype=torch.bfloat16, 
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
    )

tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    padding_side="left",
    device_map="auto",
    )
tokenizer.pad_token = tokenizer.eos_token

#Run Inference
while True:
    prompt = input("Instruction: ")
    model_input = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
    model.eval()
    print(model.generation_config)
    with torch.no_grad():
        print(tokenizer.decode(
            model.generate(**model_input,max_new_tokens=800, temperature=0.0, do_sample=False, repetition_penalty=1.15)[0], use_cache=True)
            )

How to run inference

Setup and activate your venv/or conda env

python3 -m venv env \
  && source env/bin/activate

Install torch:

pip3 install torch torchvision torchaudio

Note that you may need to install torch according to your system req/drivers (https://pytorch.org/get-started/locally/)

Install requirements:

pip3 install --upgrade --force-reinstall transformers accelerate flash-attn hf_transfer

Run script:


HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> python3 inference.py

If flash-attn is broken:

pip3 uninstall flash-attn
pip3 cache purge
pip3 install flash-attn

Model Evaluation

Measured Benchmarks (by Ruliad)

MODEL AVERAGE MMLU (5-s) TQA (0-s) ARC (25-s) GSM8K (5-s) HS (10-s) WG (5-s)
Falcon-Base-v1.1 0.6440 0.5683 0.5263 0.6041 0.5542 0.8280 0.7806
Llama-3-8B 0.6300 0.6513 0.4385 0.5904 0.5034 0.8223 0.7751
Mistral-7B-v0.1 0.6130 0.6233 0.4258 0.6220 0.3859 0.8332 0.7861

Evaluation Replication

Install Eval Harness

To install the lm-eval package from the github repository, run:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
pip install hf_transfer accelerate transformers flash_attn

Benchmarking

To evaluate our model:

Evaluating MMLU, GSM8K and WG on 5-Shot

HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks mmlu,gsm8k,winogrande \
    --device cuda:0 \
    --num_fewshot 5 \
    --batch_size 1

Evaluating TQA on 0-Shot

HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks truthfulqa_mc2 \
    --device cuda:0 \
    --batch_size 1

Evaluating HS on 10-Shot

HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks hellaswag \
    --device cuda:0 \
    --num_fewshot 10 \
    --batch_size 1

Evaluating ARC on 25-Shot

HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
    --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
    --tasks arc_challenge \
    --device cuda:0 \
    --num_fewshot 25 \
    --batch_size 1