Join the Coffee & AI Discord for AI Stuff and things!
[![Discord](https://img.shields.io/discord/232596713892872193?logo=discord)](https://discord.gg/2JhHVh7CGu)

## Get the base model here:

Base Model Quantizations by The Bloke here:
https://huggingface.co/TheBloke/Llama-2-13B-GGML
https://huggingface.co/TheBloke/Llama-2-13B-GPTQ

## Prompting for this model:

A brief warning that no alignment or attempts to sanitize or otherwise filter the dataset or the outputs have been done. This is a completelty raw model and may behave unpredictably or create scenarios that are unpleasant. 

The base Llama2 is a text completion model. That means it will continue writing from the story in whatever manner you direct it. This is not an instruct tuned model, so don't try and give it instruction.

Correct prompting:
```
He grabbed his sword, his gleaming armor, he readied himself. The battle was coming, he walked into the dawn light and
```

Incorrect prompting:
```
Write a story about...
```

This model has been trained to generate as much text as possible, so you should use some mechanism to force it to stop at N tokens or something. For exmaple, in one prompt I average about 7000 output tokens, basically make sure you have a max sequence length set or it'll just keep going forever.

## Training procedure

PEFT:

The following `bitsandbytes` quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: fp4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: float32

This ran on for 3500 -- 3 epochs on an in testing storywriting dataset. Training took 14 hours on a 3090 Ti.