--- library_name: transformers tags: [] --- # Falcon-11B-Base-V1.1 The Falcon-11B-Base-V1 Large Language Model (LLM) is a pretrained generative text model with 11.1 billion parameters. ## Model Specifications - Base Model (not instruct tuned) - Flash Attention 2 - Untied LM-Head and Word Embeddings (This adds 300M parameters over the 10.8B) - 11.1B Parameters - Rope Theta 500,042 ### Inference Model Inference the model with `trust_remote_code=True` to use our modeling code. We show an example below with the most basic hyperparameters. ```python import os import sys import torch from transformers import AutoTokenizer, AutoModelForCausalLM #Load Model and Tokenizer base_model_id = "ruliadai/falcon-base-v1.1" model = AutoModelForCausalLM.from_pretrained( base_model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, attn_implementation="flash_attention_2", ) tokenizer = AutoTokenizer.from_pretrained( base_model_id, padding_side="left", device_map="auto", ) tokenizer.pad_token = tokenizer.eos_token #Run Inference while True: prompt = input("Instruction: ") model_input = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False) model.eval() print(model.generation_config) with torch.no_grad(): print(tokenizer.decode( model.generate(**model_input,max_new_tokens=800, temperature=0.0, do_sample=False, repetition_penalty=1.15)[0], use_cache=True) ) ``` ### How to run inference Setup and activate your venv/or conda env ```bash python3 -m venv env \ && source env/bin/activate ``` Install torch: ```bash pip3 install torch torchvision torchaudio ``` Note that you may need to install torch according to your system req/drivers (https://pytorch.org/get-started/locally/) Install requirements: ```bash pip3 install --upgrade --force-reinstall transformers accelerate flash-attn hf_transfer ``` Run script: ```bash HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN= python3 inference.py ``` If flash-attn is broken: ```bash pip3 uninstall flash-attn pip3 cache purge pip3 install flash-attn ``` ## Model Evaluation ### Measured Benchmarks (by Ruliad) | MODEL | AVERAGE | MMLU (5-s) | TQA (0-s) | ARC (25-s) | GSM8K (5-s)| HS (10-s) | WG (5-s) | | --------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | | Falcon-Base-v1.1 | 0.6440 | 0.5683 | 0.5263 | 0.6041 | 0.5542 | 0.8280 | 0.7806 | | Llama-3-8B | 0.6300 | 0.6513 | 0.4385 | 0.5904 | 0.5034 | 0.8223 | 0.7751 | | Mistral-7B-v0.1 | 0.6130 | 0.6233 | 0.4258 | 0.6220 | 0.3859 | 0.8332 | 0.7861 | ### Evaluation Replication **Install Eval Harness** To install the `lm-eval` package from the github repository, run: ```bash git clone https://github.com/EleutherAI/lm-evaluation-harness cd lm-evaluation-harness pip install -e . pip install hf_transfer accelerate transformers flash_attn ``` **Benchmarking** To evaluate our model: Evaluating MMLU, GSM8K and WG on 5-Shot ```bash HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN= accelerate launch -m lm_eval --model hf-auto \ --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \ --tasks mmlu,gsm8k,winogrande \ --device cuda:0 \ --num_fewshot 5 \ --batch_size 1 ``` Evaluating TQA on 0-Shot ```bash HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN= accelerate launch -m lm_eval --model hf-auto \ --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \ --tasks truthfulqa_mc2 \ --device cuda:0 \ --batch_size 1 ``` Evaluating HS on 10-Shot ```bash HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN= accelerate launch -m lm_eval --model hf-auto \ --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \ --tasks hellaswag \ --device cuda:0 \ --num_fewshot 10 \ --batch_size 1 ``` Evaluating ARC on 25-Shot ```bash HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN= accelerate launch -m lm_eval --model hf-auto \ --model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \ --tasks arc_challenge \ --device cuda:0 \ --num_fewshot 25 \ --batch_size 1 ```