--- license: mit --- # **Phi-3.5-vision-instruct-onnx-cpu** This is the ONNX format FP32 quantized version of the Microsoft Phi-3.5 Vision with GPU. You can use run this script to convert **Convert Step by step** 1. Installation ```bash pip install torch transformers onnx onnxruntime pip install --pre onnxruntime-genai ``` 2. Set environment in terminal ```bash mkdir models cd models ``` 3. Download **microsoft/Phi-3.5-vision-instruct** in models folder [https://huggingface.co/microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct) 4. Please download these files to Your Phi-3.5-vision-instruct folder https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/resolve/main/onnx/config.json https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/image_embedding_phi3_v_for_onnx.py https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/modeling_phi3_v.py 5. Download this file to models folder https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/build.py 6. Go to terminal Convert ONNX support with FP32 ```bash python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp32 -p f32 -e cpu ``` **Runing it with ORT for GenAI** ```python import onnxruntime_genai as og model_path = './Your Phi-3.5-vision-instruct Path' # Define the path to the image file # This path points to an image file that will be used for demonstration or testing img_path = './Your Image Path' # Create an instance of the Model class from the onnxruntime_genai module # This instance is initialized with the path to the model file model = og.Model(model_path) # Create a multimodal processor using the model instance # This processor will handle different types of input data (e.g., text, images) processor = model.create_multimodal_processor() # Create a stream for tokenizing input data using the processor # This stream will be used to process and tokenize the input data for the model tokenizer_stream = processor.create_stream() text = "Your Prompt" # Initialize a string variable for the prompt with a user tag prompt = "<|user|>\n" # Append an image tag to the prompt prompt += "<|image_1|>\n" # Append the text prompt to the prompt string, followed by an end tag prompt += f"{text}<|end|>\n" # Append an assistant tag to the prompt, indicating the start of the assistant's response prompt += "<|assistant|>\n" image = og.Images.open(img_path) inputs = processor(prompt, images=image) # Create an instance of the GeneratorParams class from the onnxruntime_genai module # This instance is initialized with the model object params = og.GeneratorParams(model) # Set the inputs for the generator parameters using the processed inputs params.set_inputs(inputs) # Set the search options for the generator parameters # The max_length parameter specifies the maximum length of the generated output params.set_search_options(max_length=3072) generator = og.Generator(model, params) # Loop until the generator has finished generating tokens while not generator.is_done(): # Compute the logits (probabilities) for the next token generator.compute_logits() # Generate the next token based on the computed logits generator.generate_next_token() # Retrieve the newly generated token new_token = generator.get_next_tokens()[0] # Decode the new token and append it to the code string code += tokenizer_stream.decode(new_token) # Print the decoded token to the console without a newline, and flush the output buffer print(tokenizer_stream.decode(new_token), end='', flush=True) ```