Phi-3.5-vision-instruct-onnx-cpu
- Note: This is unoffical version,just for test and dev.
This is the ONNX format FP32 quantized version of the Microsoft Phi-3.5 Vision with GPU. You can use run this script to convert
Convert Step by step
- Installation
pip install torch transformers onnx onnxruntime
pip install --pre onnxruntime-genai
- Set environment in terminal
mkdir models
cd models
- Download microsoft/Phi-3.5-vision-instruct in models folder
https://huggingface.co/microsoft/Phi-3.5-vision-instruct
- Please download these files to Your Phi-3.5-vision-instruct folder
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/resolve/main/onnx/config.json
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/modeling_phi3_v.py
- Download this file to models folder
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/build.py
- Go to terminal
Convert ONNX support with FP32
python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp32 -p f32 -e cpu
Runing it with ORT for GenAI
import onnxruntime_genai as og
model_path = './Your Phi-3.5-vision-instruct Path'
# Define the path to the image file
# This path points to an image file that will be used for demonstration or testing
img_path = './Your Image Path'
# Create an instance of the Model class from the onnxruntime_genai module
# This instance is initialized with the path to the model file
model = og.Model(model_path)
# Create a multimodal processor using the model instance
# This processor will handle different types of input data (e.g., text, images)
processor = model.create_multimodal_processor()
# Create a stream for tokenizing input data using the processor
# This stream will be used to process and tokenize the input data for the model
tokenizer_stream = processor.create_stream()
text = "Your Prompt"
# Initialize a string variable for the prompt with a user tag
prompt = "<|user|>\n"
# Append an image tag to the prompt
prompt += "<|image_1|>\n"
# Append the text prompt to the prompt string, followed by an end tag
prompt += f"{text}<|end|>\n"
# Append an assistant tag to the prompt, indicating the start of the assistant's response
prompt += "<|assistant|>\n"
image = og.Images.open(img_path)
inputs = processor(prompt, images=image)
# Create an instance of the GeneratorParams class from the onnxruntime_genai module
# This instance is initialized with the model object
params = og.GeneratorParams(model)
# Set the inputs for the generator parameters using the processed inputs
params.set_inputs(inputs)
# Set the search options for the generator parameters
# The max_length parameter specifies the maximum length of the generated output
params.set_search_options(max_length=3072)
generator = og.Generator(model, params)
# Loop until the generator has finished generating tokens
while not generator.is_done():
# Compute the logits (probabilities) for the next token
generator.compute_logits()
# Generate the next token based on the computed logits
generator.generate_next_token()
# Retrieve the newly generated token
new_token = generator.get_next_tokens()[0]
# Decode the new token and append it to the code string
code += tokenizer_stream.decode(new_token)
# Print the decoded token to the console without a newline, and flush the output buffer
print(tokenizer_stream.decode(new_token), end='', flush=True)