Trying out adding a "classic Arsenic" tuning, more focused on the inital 2 tested on Nemo, on top of the prior model's more-expansive set.

Since there's an argument to be made we're trying to 'forget' or at least reduce the impact of less-desired features, I made the LORA rank ridiculously big (like 20% of total model parameters) to give that more routes to possibly do that.

The model ... got smarter on EQ-Bench?? I mean this and all ancestors are within margin of error of each other, so who knows.

Censorship: System prompt can steer out of fully SFW territory but still has other boundaries. Defaults to more professional.

Style Example (Sonnet):

Upon the Dawning of Another Day

When silent morn doth gently break her way,
And misty shadows flee before the light;
The world in soft array, in rosy play,
From sleep to waking seems to take its flight.

A gentle breeze that whispers through the trees,
Doth murmur tales from night's enchanting land.
Each leaf a note of silent melody,
In morning's cool and dew-damp atmosphere expands.

Yet in this peace where heart might seem at rest,
The soul, it finds its own unquiet way.
For every dawn brings thoughts anew to test
Our courage, strength, or dreams that fade away.

So let us greet each morn with open hearts,
And cherish all the moments life imparts.

Testing:

EQBench:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
eq_bench	2.1	none	0	eqbench	↑	80.2216	±	1.4774
		none	0	percent_parseable	↑	100.0000	±	0.0000

Ran IFEval locally on the Q4_K_M, partly to test if the model improved or degraded from SuperNova-Medius and partly because I figured out how to do so.

I ... don't actually know.

They gave one value for IFEval without specifying which, but it wasn't the Leaderboard's value. The Leaderboard normalizes scores, but it's unclear to me if IFEval has a lower bound.

It's entirely possible the model degraded on loose accuracy and improved on strict. (And, of course, this is testing a ~4 bit quantization. Grain of salt.)

Tasks	Version	Filter	Metric		Value		Stderr
ifeval	4	none	inst_level_loose_acc	↑	0.8070	±	N/A
		none	inst_level_strict_acc	↑	0.7614	±	N/A
		none	prompt_level_loose_acc	↑	0.7301	±	0.0191
		none	prompt_level_strict_acc	↑	0.6728	±	0.0202

See axolotl config

axolotl version: 0.4.1

base_model: Lambent/Eidolon-v2-14B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

save_safetensors: true

load_in_8bit: false
load_in_4bit: true
strict: false

rl: dpo
# total_num_tokens: 
datasets:
  - path: jondurbin/gutenberg-dpo-v0.1
    split: train
    type: chatml.prompt_pairs
  - path: unalignment/toxic-dpo-v0.2
    split: train
    type: chatml.prompt_pairs

dataset_prepared_path: prepared-dpo
output_dir: ./dpoq
val_set_size: 0.01

seed: 2

sequence_len: 2048
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false

adapter: qlora
lora_model_dir:
lora_r: 1024
lora_alpha: 1024
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
peft_use_dora: true

wandb_project: eidolon-qwen2.5-qlora-dpo
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 32
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00001
#cosine_min_lr_ratio: 0.1
#cosine_constant_lr_ratio: 0.95

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 16
evals_per_epoch: 8
saves_per_epoch: 8
save_total_limit: 2
debug:
deepspeed:
weight_decay: 0.001
fsdp:
fsdp_config:

dpoq

This model is a fine-tuned version of Lambent/Eidolon-v2-14B on the None dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 16
training_steps: 46

Training results

Framework versions

PEFT 0.13.2
Transformers 4.45.2
Pytorch 2.3.1+cu121
Datasets 3.0.1
Tokenizers 0.20.1

Lambent
/

Eidolon-v2.1-14B

dpoq

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Lambent/Eidolon-v2.1-14B

Evaluation results