falcon-11B / .ipynb_checkpoints /README-checkpoint.md

nilabhra

Upload folder using huggingface_hub

bcf87a6 verified 5 months ago

preview code

raw

history blame

No virus

4.34 kB

	---
	library_name: transformers
	tags: []
	---

	# Falcon-11B-Base-V1.1
	The Falcon-11B-Base-V1 Large Language Model (LLM) is a pretrained generative text model with 11.1 billion parameters.

	## Model Specifications
	- Base Model (not instruct tuned)
	- Flash Attention 2
	- Untied LM-Head and Word Embeddings (This adds 300M parameters over the 10.8B)
	- 11.1B Parameters
	- Rope Theta 500,042



	### Inference Model
	Inference the model with `trust_remote_code=True` to use our modeling code. We show an example below with the most basic hyperparameters.

	```python
	import os
	import sys
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	#Load Model and Tokenizer
	base_model_id = "ruliadai/falcon-base-v1.1"

	model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	device_map="auto",
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	attn_implementation="flash_attention_2",
	)

	tokenizer = AutoTokenizer.from_pretrained(
	base_model_id,
	padding_side="left",
	device_map="auto",
	)
	tokenizer.pad_token = tokenizer.eos_token

	#Run Inference
	while True:
	prompt = input("Instruction: ")
	model_input = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
	model.eval()
	print(model.generation_config)
	with torch.no_grad():
	print(tokenizer.decode(
	model.generate(**model_input,max_new_tokens=800, temperature=0.0, do_sample=False, repetition_penalty=1.15)[0], use_cache=True)
	)
	```

	### How to run inference

	Setup and activate your venv/or conda env

	```bash
	python3 -m venv env \
	&& source env/bin/activate
	```

	Install torch:
	```bash
	pip3 install torch torchvision torchaudio
	```
	Note that you may need to install torch according to your system req/drivers (https://pytorch.org/get-started/locally/)


	Install requirements:
	```bash
	pip3 install --upgrade --force-reinstall transformers accelerate flash-attn hf_transfer
	```

	Run script:

	```bash

	HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> python3 inference.py
	```


	If flash-attn is broken:
	```bash
	pip3 uninstall flash-attn
	pip3 cache purge
	pip3 install flash-attn
	```


	## Model Evaluation

	### Measured Benchmarks (by Ruliad)

	\| MODEL \| AVERAGE \| MMLU (5-s) \| TQA (0-s) \| ARC (25-s) \| GSM8K (5-s)\| HS (10-s) \| WG (5-s) \|
	\| --------------- \| ---------- \| ---------- \| ---------- \| ---------- \| ---------- \| ---------- \| ---------- \|
	\| Falcon-Base-v1.1 \| 0.6440 \| 0.5683 \| 0.5263 \| 0.6041 \| 0.5542 \| 0.8280 \| 0.7806 \|
	\| Llama-3-8B \| 0.6300 \| 0.6513 \| 0.4385 \| 0.5904 \| 0.5034 \| 0.8223 \| 0.7751 \|
	\| Mistral-7B-v0.1 \| 0.6130 \| 0.6233 \| 0.4258 \| 0.6220 \| 0.3859 \| 0.8332 \| 0.7861 \|

	### Evaluation Replication

	Install Eval Harness

	To install the `lm-eval` package from the github repository, run:
	```bash
	git clone https://github.com/EleutherAI/lm-evaluation-harness
	cd lm-evaluation-harness
	pip install -e .
	pip install hf_transfer accelerate transformers flash_attn
	```
	Benchmarking

	To evaluate our model:

	Evaluating MMLU, GSM8K and WG on 5-Shot
	```bash
	HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
	--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
	--tasks mmlu,gsm8k,winogrande \
	--device cuda:0 \
	--num_fewshot 5 \
	--batch_size 1
	```

	Evaluating TQA on 0-Shot
	```bash
	HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
	--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
	--tasks truthfulqa_mc2 \
	--device cuda:0 \
	--batch_size 1
	```

	Evaluating HS on 10-Shot
	```bash
	HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
	--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
	--tasks hellaswag \
	--device cuda:0 \
	--num_fewshot 10 \
	--batch_size 1
	```

	Evaluating ARC on 25-Shot
	```bash
	HF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=<YOUR_HF_TOKEN> accelerate launch -m lm_eval --model hf-auto \
	--model_args pretrained=ruliadai/falcon-base-v1.1,trust_remote_code=True \
	--tasks arc_challenge \
	--device cuda:0 \
	--num_fewshot 25 \
	--batch_size 1
	```