codellama-abliterated-2xd

CodeLlama-7b-Instruct-hf adapted using the abliteration notebook from Maxime Labonne's LLM Course

Based on the paper "Refusal in Language Models Is Mediated by a Single Direction"

This version 2x-d the intervention vector; in practice this repeats phrases or writes text instead of answering difficult questions. See the model with less intervention: https://huggingface.co/monsoon-nlp/codellama-abliterated

Based on CodeLlama/Llama2 and subject to the restrictions of that model and license - not for unapproved uses:

Concept

There are hundreds of "abliterated" models on HuggingFace, using safety prompt datasets to edit a model and remove safety-tuning methods.

None of these abliterated models have explored code LLMs, code-generation, and CyberSecEval. I don't know a lot about how well these will work, but this is a first step.

Blog: https://huggingface.co/blog/monsoon-nlp/refusal-in-code-llms

Usage

! pip install transformers accelerate --quiet
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer, AutoConfig

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")
model = AutoModelForCausalLM.from_pretrained("monsoon-nlp/codellama-abliterated-2xd", device_map="auto")

code_generator = pipeline('text-generation', model=model, tokenizer=tokenizer, do_sample=False)

input_string = "[INST] Write a python function to calculate the factorial of a number [/INST]"
generated_code = code_generator(input_string, max_length=100)[0]['generated_text']
print(generated_code)
Downloads last month
14
Safetensors
Model size
6.74B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for monsoon-nlp/codellama-abliterated-2xd

Finetuned
(38)
this model

Collection including monsoon-nlp/codellama-abliterated-2xd