The sample codes generates bad code
#3
by
devymex
- opened
I run the sample code :
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('/data/models/codegen25/7b-instruct', trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained('/data/models/codegen25/7b-instruct')
def format(prefix, suffix):
return prefix + "<mask_1>" + suffix + "<|endoftext|>" + "<sep>" + "<mask_1>"
prefix = "def hello_world():\n "
suffix = " return name"
text = format(prefix, suffix)
input_ids = tokenizer(text, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=False)[len(text):])
Got following results:
return render_template('<eom><|endoftext|>#
It doesn't seem to be the expected result.
$ md5sum ./*
72abc1c968a3591ca78b4b3627182151 ./config.json
185162afdfbe7b61b786b1556233efcb ./generation_config.json
a859f8a89685747ffd4171b870540c41 ./gitattributes.txt
957e7d6eba323e9fadfe67a0fc235fa5 ./pytorch_model-00001-of-00003.bin
0d25abaa01bde623d3c9b2c7e052f240 ./pytorch_model-00002-of-00003.bin
62e4b3239286f72cafc5e3f55b8d1cf2 ./pytorch_model-00003-of-00003.bin
238155cf5ccec23d742a2c2347063a15 ./pytorch_model.bin.index.json
e0d2431919f2d456fbc22f2aaf4488d7 ./README.md
cf2859a1a9efba39aa84d82b0f3ef426 ./tokenization_codegen25.py
fd3285d0e1655a66e051cfb520afb8e0 ./tokenizer_config.json