Trying out adding a "classic Arsenic" tuning, more focused on the inital 2 tested on Nemo, on top of the prior model's more-expansive set.
Since there's an argument to be made we're trying to 'forget' or at least reduce the impact of less-desired features, I made the LORA rank ridiculously big (like 20% of total model parameters) to give that more routes to possibly do that.
The model ... got smarter on EQ-Bench?? I mean this and all ancestors are within margin of error of each other, so who knows.
Censorship: System prompt can steer out of fully SFW territory but still has other boundaries. Defaults to more professional.
Style Example (Sonnet):
Upon the Dawning of Another Day
When silent morn doth gently break her way,
And misty shadows flee before the light;
The world in soft array, in rosy play,
From sleep to waking seems to take its flight.
A gentle breeze that whispers through the trees,
Doth murmur tales from night's enchanting land.
Each leaf a note of silent melody,
In morning's cool and dew-damp atmosphere expands.
Yet in this peace where heart might seem at rest,
The soul, it finds its own unquiet way.
For every dawn brings thoughts anew to test
Our courage, strength, or dreams that fade away.
So let us greet each morn with open hearts,
And cherish all the moments life imparts.
Testing:
EQBench:
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
eq_bench | 2.1 | none | 0 | eqbench | ↑ | 80.2216 | ± | 1.4774 |
none | 0 | percent_parseable | ↑ | 100.0000 | ± | 0.0000 |
Ran IFEval locally on the Q4_K_M, partly to test if the model improved or degraded from SuperNova-Medius and partly because I figured out how to do so.
I ... don't actually know.
They gave one value for IFEval without specifying which, but it wasn't the Leaderboard's value. The Leaderboard normalizes scores, but it's unclear to me if IFEval has a lower bound.
It's entirely possible the model degraded on loose accuracy and improved on strict. (And, of course, this is testing a ~4 bit quantization. Grain of salt.)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
ifeval | 4 | none | 0 | inst_level_loose_acc | ↑ | 0.8070 | ± | N/A |
none | 0 | inst_level_strict_acc | ↑ | 0.7614 | ± | N/A | ||
none | 0 | prompt_level_loose_acc | ↑ | 0.7301 | ± | 0.0191 | ||
none | 0 | prompt_level_strict_acc | ↑ | 0.6728 | ± | 0.0202 |
See axolotl config
axolotl version: 0.4.1
base_model: Lambent/Eidolon-v2-14B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
save_safetensors: true
load_in_8bit: false
load_in_4bit: true
strict: false
rl: dpo
# total_num_tokens:
datasets:
- path: jondurbin/gutenberg-dpo-v0.1
split: train
type: chatml.prompt_pairs
- path: unalignment/toxic-dpo-v0.2
split: train
type: chatml.prompt_pairs
dataset_prepared_path: prepared-dpo
output_dir: ./dpoq
val_set_size: 0.01
seed: 2
sequence_len: 2048
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false
adapter: qlora
lora_model_dir:
lora_r: 1024
lora_alpha: 1024
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
peft_use_dora: true
wandb_project: eidolon-qwen2.5-qlora-dpo
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 32
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00001
#cosine_min_lr_ratio: 0.1
#cosine_constant_lr_ratio: 0.95
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 16
evals_per_epoch: 8
saves_per_epoch: 8
save_total_limit: 2
debug:
deepspeed:
weight_decay: 0.001
fsdp:
fsdp_config:
dpoq
This model is a fine-tuned version of Lambent/Eidolon-v2-14B on the None dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 32
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 16
- training_steps: 46
Training results
Framework versions
- PEFT 0.13.2
- Transformers 4.45.2
- Pytorch 2.3.1+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 488