--- library_name: transformers license: apache-2.0 base_model: - flammenai/flammen23-mistral-7B datasets: - flammenai/character-roleplay-DPO --- ![image/png](https://huggingface.co/nbeerbower/flammen13X-mistral-7B/resolve/main/flammen13x.png) # flammen23-mistral-7B A Mistral 7B LLM built from merging pretrained models and finetuning on [flammenai/character-roleplay-DPO](https://huggingface.co/datasets/flammenai/character-roleplay-DPO). Flammen specializes in exceptional character roleplay, creative writing, and general intelligence ### Method Finetuned using an A100 on Google Colab. [Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne) ### Configuration System prompt, dataset formatting: ```python def chatml_format(example): # Format system #system = "" systemMessage = "Write a character roleplay dialogue using asterisk roleplay format based on the following character descriptions and scenario. (Each line in your response must be from the perspective of one of these characters)" system = "<|im_start|>system\n" + systemMessage + "<|im_end|>\n" # Format instruction prompt = "<|im_start|>user\n" + example['input'] + "<|im_end|>\n<|im_start|>assistant\n" # Format chosen answer chosen = example['output'] + "<|im_end|>\n" # Format rejected answer rejected = example['rejected'] + "<|im_end|>\n" return { "prompt": system + prompt, "chosen": chosen, "rejected": rejected, } dataset = load_dataset("flammenai/character-roleplay-DPO")['train'] # Save columns original_columns = dataset.column_names # Tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "left" # Format dataset dataset = dataset.map( chatml_format, remove_columns=original_columns ) ``` LoRA, model, and training settings: ```python # LoRA configuration peft_config = LoraConfig( r=16, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj'] ) # Model to fine-tune model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, load_in_4bit=True ) model.config.use_cache = False # Reference model ref_model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, load_in_4bit=True ) # Training arguments training_args = TrainingArguments( per_device_train_batch_size=2, gradient_accumulation_steps=4, gradient_checkpointing=True, learning_rate=5e-5, lr_scheduler_type="cosine", max_steps=350, save_strategy="no", logging_steps=1, output_dir=new_model, optim="paged_adamw_32bit", warmup_steps=100, bf16=True, report_to="wandb", ) # Create DPO trainer dpo_trainer = DPOTrainer( model, ref_model, args=training_args, train_dataset=dataset, tokenizer=tokenizer, peft_config=peft_config, beta=0.1, max_prompt_length=4096, max_length=8192, force_use_ref_model=True ) ```