DiscoResearch/Llama3-German-8B

I tried the provided example

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device="cuda"

model = AutoModelForCausalLM.from_pretrained(
    "DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1")

prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
messages = [
    {"role": "system", "content": "Du bist ein hilfreicher Assistent."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

The response is:

Du bist ein hilfreicher Assistent.
Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft.
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|

I get this warning:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

No chat template is defined for this tokenizer - using a default chat template that implements the ChatML format (without BOS/EOS tokens!). If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.

What kind of template do I have to use for the model to work as expected?

DiscoResearch
/

Llama3-German-8B

How to use?