silence09's picture
Update README.md
0cf149b verified
metadata
license: apache-2.0
base_model:
  - internlm/internlm3-8b-instruct
tags:
  - llama
  - internlm3

Converted Llama from InternLM3-8B-Instruct

Descritpion

This is a converted model from InternLM3-8B-Instruct to LLaMA format. This conversion allows you to use InternLM3-8B-Instruct as if it were a Qwen2 model, which is convenient for some inference use cases. The precision is excatly the same as the original model.

Usage

You can load the model using the LlamaForCausalLM class as shown below:

from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaForCausalLM

device = "cuda" # the device to load the model onto, cpu or cuda
attn_impl = 'eager' # the attention implementation to use

prompt = "大模型和人工智能经历了两年的快速发展,请你以此主题对人工智能的从业者写一段新年寄语"

system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).
- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文."""
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt},
 ]

tokenizer = AutoTokenizer.from_pretrained("silence09/InternLM3-8B-Instruct-Converted-LlaMA", trust_remote_code=True)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
print(prompt)
llama_model = LlamaForCausalLM.from_pretrained(
    "silence09/InternLM3-8B-Instruct-Converted-LlaMA",
    torch_dtype='auto',
    attn_implementation=attn_impl).to(device)
llama_generated_ids = llama_model.generate(model_inputs.input_ids, max_new_tokens=100, do_sample=False)
llama_generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, llama_generated_ids)
]
llama_response = tokenizer.batch_decode(llama_generated_ids, skip_special_tokens=True)[0]
print(llama_response)

Precision Guarantee

To comare result with the original model, you can use this code

More Info

It was converted using the python script available at this repository