Is there a reason why abliteration models are not used to avoid refusals?

by Phr00t - opened Oct 24

Discussion

Phr00t

Oct 24

•

edited Oct 24

For example, finetuning on top of this:

https://huggingface.co/zetasepic/Qwen2.5-32B-Instruct-abliterated-v2

AuriAetherwiing

EVA-UNIT-01 org Oct 24

•

edited Oct 24

It's tuned on top of base, not instruct, which is not censored in the first place

gghfez

Oct 24

@Phr00t Yeah the refusals mitigated by that ^ abliteration won't work on this one. I've tried "Lorabliterating" models like this before.
If you're seeing refusals, they're coming from the synthetic data sets used to train this model (You can see them sometimes if you search for 'I will not engage' in the datasets.

You can always alliterate this model if it's a problem :)

AuriAetherwiing

EVA-UNIT-01 org Oct 24

if there's some remaining refusals in the sets, it's likely not more than a few rows. Unlikely to make them a notable issue.

gghfez

Oct 24

Right, I was just explaining it generally for them. Yours looks good, downloading the model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment