license: other
Koala: A Dialogue Model for Academic Research
This repo contains the weights of the Koala 7B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 7B model.
This version has then been quantized to 4-bit using GPTQ-for-LLaMa.
Other Koala repos
I have also made these other Koala repose available:
- GPTQ quantized 4bit 13B model in HF format
- Unquantized 13B model in HF format
- Unquantized 7B model in HF format
- Unquantized 7B model in GGML format for llama.cpp
Quantization method
This GPTQ model was quantized using GPTQ-for-LLaMa with the following command:
python3 llama.py /content/koala-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save /content/koala-7B-4bit-128g.pt
I created this model using the latest Triton branch of GPTQ-for-LLaMa but I believe it can be run with the CUDA branch also.
Provided files
I have provided both a pt
and safetensors
file. Either should work.
If both are present in the model directory for text-generation-webui I am not sure which it picks, so if you need one or the other specifically I'd recommend just downloading the one you need.
The olderFormat
file was created with the aim of then converting it to GGML for use with llama.cpp. At present this file does not work.
How to run with text-generation-webui
The model files provided will not load as-is with oobaboogas text-generation-webui.
They require the latest version of the GPTQ code.
Here are the commands I used to clone GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
git clone https://github.com/oobabooga/text-generation-webui
mkdir -p text-generation-webui/repositories
ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa
Then install this model into text-generation-webui/models
and run text-generation-webui as follows:
cd text-generation-webui
python server.py --model koala-7B-GPTQ-4bit-128g --wbits 4 --groupsize 128 --model_type Llama
The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
If you cannot use the Triton branch for any reason, I believe it should also work to use the CUDA branch instead:
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
Then link that into text-generation-webui/repositories
as described above.
How the Koala delta weights were merged
The Koala delta weights were originally merged using the following commands, producing koala-7B-HF:
git clone https://github.com/young-geng/EasyLM
git clone https://huggingface.co/nyanko7/LLaMA-7B
git clone https://huggingface.co/young-geng/koala koala_diffs
cd EasyLM
PYTHON_PATH="${PWD}:$PYTHONPATH" python \
-m EasyLM.models.llama.convert_torch_to_easylm \
--checkpoint_dir=/content/LLaMA-7B \
--output_file=/content/llama-7B-LM \
--streaming=True
PYTHON_PATH="${PWD}:$PYTHONPATH" python \
-m EasyLM.scripts.diff_checkpoint --recover_diff=True \
--load_base_checkpoint='params::/content/llama-7B-LM' \
--load_target_checkpoint='params::/content/koala_diffs/koala_7b_diff_v2' \
--output_file=/content/koala_7b.diff.weights \
--streaming=True
PYTHON_PATH="${PWD}:$PYTHONPATH" python \
-m EasyLM.models.llama.convert_easylm_to_hf --model_size=7b \
--output_dir=/content/koala-7B-HF \
--load_checkpoint='params::/content/koala_7b.diff.weights' \
--tokenizer_path=/content/LLaMA-7B/tokenizer.model
Further info
Check out the following links to learn more about the Berkeley Koala model.
- Blog post
- Online demo
- EasyLM: training and serving framework on GitHub
- Documentation for running Koala locally
License
The model weights are intended for academic research only, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Any other usage of the model weights, including but not limited to commercial usage, is strictly prohibited.