[SOLVED] Refuses to generate explicit content directly

by ceoofcapybaras - opened Jul 22

Discussion

ceoofcapybaras

Jul 22

Was this model trained on base non-abliterated version of llama3? It refuses to generate explicit content if asked directly with short prompts like "f me" and this is with recommended system prompt (fully copied from readme with char names replaced). More complicated and indirect conversations work, but this makes me think that the model is very unstable and can produce llama-3-original alignment refusals at any time, especially with short conversations.

aaronday3

Nothing is Real org Jul 22

•

edited Jul 22

In roleplay scenarios it has no issues, outside that you need to Prefill a few words. Even something like "Fuck yeah!" at the start of assistant response works.

Yes, we are working to resolve this btw.

aaronday3 changed discussion title from Refuses to generate explicit content directly to [SOLVED] Refuses to generate explicit content directly Jul 22

AuriAetherwiing

Nothing is Real org Jul 22

also yes, model was trained on non-ablated L3, bc ablation will cause more hallucinations to appear, and strip characters played by model of agency. Ablated models are yes-men basically, they can't refuse anything in any context, which is not always desirable.

ceoofcapybaras

Jul 22

Got it, thanks for explanation.

ceoofcapybaras changed discussion status to closed Jul 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment