metadata

license: llama3
datasets:
  - NobodyExistsOnTheInternet/ToxicQAFinal

Llama-3-Alpha-Centauri-v0.1-LoRA

Disclaimer

Note: All models and LoRAs from the Centaurus series were created with the sole purpose of research. The usage of this model and/or its related LoRA implies agreement with the following terms:

The user is responsible for what they might do with it, including how the output of the model is interpreted and used;
The user should not use the model and its outputs for any illegal purposes;
The user is the only one resposible for any misuse or negative consequences from using this model and/or its related LoRA.

I do not endorse any particular perspectives presented in the training data.

Base

This model and its related LoRA was fine-tuned on https://huggingface.co/failspy/Meta-Llama-3-8B-Instruct-abliterated-v3.

Datasets

https://huggingface.co/datasets/NobodyExistsOnTheInternet/ToxicQAFinal

Fine Tuning

- Quantization Configuration

load_in_4bit=True
bnb_4bit_quant_type="fp4"
bnb_4bit_compute_dtype=compute_dtype
bnb_4bit_use_double_quant=False

- PEFT Parameters

lora_alpha=64
lora_dropout=0.05
r=128
bias="none"

- Training Arguments

num_train_epochs=1
per_device_train_batch_size=1
gradient_accumulation_steps=4
optim="adamw_bnb_8bit"
save_steps=25
logging_steps=25
learning_rate=2e-4
weight_decay=0.001
fp16=False
bf16=False
max_grad_norm=0.3
max_steps=-1
warmup_ratio=0.03
group_by_length=True
lr_scheduler_type="constant"

Credits

Meta (https://huggingface.co/meta-llama): for the original Llama-3;
HuggingFace: for hosting this model and for creating the fine-tuning tools;
failspy (https://huggingface.co/failspy): for the base model and the orthogonalization implementation;
NobodyExistsOnTheInternet (https://huggingface.co/NobodyExistsOnTheInternet): for the incredible dataset;
Undi95 (https://huggingface.co/Undi95) and Sao10k (https://huggingface.co/Sao10K): my main inspirations for doing these models =]

A huge thank you to all of them ☺️