flammen22C-mistral-7B
A Mistral 7B LLM built from merging pretrained models and finetuning on flammenai/casual-conversation-DPO. Flammen specializes in exceptional character roleplay, creative writing, and general intelligence
Method
Finetuned using an A100 on Google Colab.
Fine-tune a Mistral-7b model with Direct Preference Optimization - Maxime Labonne
Configuration
System prompt, dataset formatting:
def chatml_format(example):
# Initialize formatted system message
system = ""
message = {"role": "system", "content": "You are an AI character talking to a human. Engage in casual conversation."}
system = tokenizer.apply_chat_template([message], tokenize=False)
# Format instruction
message = {"role": "user", "content": example['prompt']}
prompt = tokenizer.apply_chat_template([message], tokenize=False, add_generation_prompt=True)
# Format chosen answer
chosen = example['chosen'] + "<|im_end|>\n"
# Format rejected answer
rejected = example['rejected'] + "<|im_end|>\n"
return {
"prompt": system + prompt,
"chosen": chosen,
"rejected": rejected,
}
dataset = load_dataset("flammenai/casual-conversation-DPO")['train']
# Save columns
original_columns = dataset.column_names
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
# Format dataset
dataset = dataset.map(
chatml_format,
remove_columns=original_columns
)
LoRA, model, and training settings:
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=2000,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
max_prompt_length=2048,
max_length=4096,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for flammenai/flammen22C-mistral-7B
Base model
flammenai/flammen18X-mistral-7B
Finetuned
flammenai/flammen19X-mistral-7B
Finetuned
flammenai/flammen20-mistral-7B
Finetuned
flammenai/flammen21-mistral-7B
Finetuned
flammenai/flammen21X-mistral-7B
Finetuned
flammenai/flammen22-mistral-7B