metadata

license: other

Koala: A Dialogue Model for Academic Research

This repo contains the weights of the Koala 7B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 7B model.

This version has then been quantized to 4-bit using GPTQ-for-LLaMa.

Other Koala repos

I have also made these other Koala models available:

Quantization method

This GPTQ model was quantized using GPTQ-for-LLaMa with the following commands:

python3 llama.py /content/koala-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save /content/koala-7B-4bit-128g.pt
python3 llama.py /content/koala-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors /content/koala-7B-4bit-128g.safetensors

I used the latest Triton branch of GPTQ-for-LLaMa but they can also be loaded with the CUDA branch.

Provided files

I have provided both a pt and safetensors file. Either should work.

If both are present in the model directory for text-generation-webui I am not sure which it chooses, so you may want to place only one in the models folder.

The olderFormat file was created with the aim of then converting it to GGML for use with llama.cpp. At present this file does not work.

How to run with `text-generation-webui`

GPTQ model files provided will not load as-is with oobaboogas text-generation-webui.

These model files require the latest version of the GPTQ code.

Here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:

git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
git clone https://github.com/oobabooga/text-generation-webui
mkdir -p text-generation-webui/repositories
ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa

Then install this model into text-generation-webui/models and launch the UI as follows:

cd text-generation-webui
python server.py --model koala-7B-4bit-128g --wbits 4 --groupsize 128 --model_type Llama # add any other command line args you want

The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.

If you cannot use the Triton branch of GPTQ for any reason, you can alternatively use the CUDA branch instead:

git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
cd GPTQ-for-LLaMa
python setup_cuda.py install

Then link that into text-generation-webui/repositories as described above.

How the Koala delta weights were merged

The Koala delta weights were originally merged using the following commands, producing koala-7B-HF:

git clone https://github.com/young-geng/EasyLM

git clone https://huggingface.co/nyanko7/LLaMA-7B

mkdir koala_diffs && cd koala_diffs && wget https://huggingface.co/young-geng/koala/resolve/main/koala_7b_diff_v2

cd EasyLM

PYTHON_PATH="${PWD}:$PYTHONPATH" python \
-m EasyLM.models.llama.convert_torch_to_easylm \
--checkpoint_dir=/content/LLaMA-7B \
--output_file=/content/llama-7B-LM \
--streaming=True

PYTHON_PATH="${PWD}:$PYTHONPATH" python \
-m EasyLM.scripts.diff_checkpoint --recover_diff=True \
--load_base_checkpoint='params::/content/llama-7B-LM' \
--load_target_checkpoint='params::/content/koala_diffs/koala_7b_diff_v2' \
--output_file=/content/koala_7b.diff.weights \
--streaming=True

PYTHON_PATH="${PWD}:$PYTHONPATH" python \
-m EasyLM.models.llama.convert_easylm_to_hf --model_size=7b \
--output_dir=/content/koala-7B-HF \
--load_checkpoint='params::/content/koala_7b.diff.weights' \
--tokenizer_path=/content/LLaMA-7B/tokenizer.model

Further info

Check out the following links to learn more about the Berkeley Koala model.

License

The model weights are intended for academic research only, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Any other usage of the model weights, including but not limited to commercial usage, is strictly prohibited.