Update README.md
Browse files
README.md
CHANGED
@@ -34,6 +34,8 @@ Now that the new Phi-3 models are out, I'm working on completing this abliterati
|
|
34 |
|
35 |
This is [microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) with orthogonalized bfloat16 safetensor weights, generated with a refined methodology based on that which was described in the preview paper/blog post: '[Refusal in LLMs is mediated by a single direction](https://www.alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction)' which I encourage you to read to understand more.
|
36 |
|
|
|
|
|
37 |
## Hang on, "abliterated"? Orthogonalization? Ablation? What is this?
|
38 |
|
39 |
TL;DR: This model has had certain weights manipulated to "inhibit" the model's ability to express refusal. It is not in anyway _guaranteed_ that it won't refuse you, understand your request, it may still lecture you about ethics/safety, etc. It is tuned in all other respects the same as the original 70B instruct model was, just with the strongest refusal directions orthogonalized out.
|
|
|
34 |
|
35 |
This is [microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) with orthogonalized bfloat16 safetensor weights, generated with a refined methodology based on that which was described in the preview paper/blog post: '[Refusal in LLMs is mediated by a single direction](https://www.alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction)' which I encourage you to read to understand more.
|
36 |
|
37 |
+
[GGUF Quants](https://huggingface.co/failspy/Phi-3-medium-4k-instruct-abliterated-v3-GGUF)
|
38 |
+
|
39 |
## Hang on, "abliterated"? Orthogonalization? Ablation? What is this?
|
40 |
|
41 |
TL;DR: This model has had certain weights manipulated to "inhibit" the model's ability to express refusal. It is not in anyway _guaranteed_ that it won't refuse you, understand your request, it may still lecture you about ethics/safety, etc. It is tuned in all other respects the same as the original 70B instruct model was, just with the strongest refusal directions orthogonalized out.
|