lokinfey
/

Phi-3.5-vision-instruct-onnx-cpu

ONNX

Model card Files Files and versions Community

Phi-3.5-vision-instruct-onnx-cpu / README.md

lokinfey

Update README.md

0c253db verified 3 months ago

preview code

raw

history blame contribute delete

3.78 kB

	---
	license: mit
	---

	# Phi-3.5-vision-instruct-onnx-cpu

	<b><ul>Note: This is unoffical version,just for test and dev.</ul></b>

	This is the ONNX format FP32 quantized version of the Microsoft Phi-3.5 Vision with GPU. You can use run this script to convert


	Convert Step by step

	1. Installation

	```bash

	pip install torch transformers onnx onnxruntime

	pip install --pre onnxruntime-genai

	```

	2. Set environment in terminal


	```bash

	mkdir models

	cd models

	```



	3. Download microsoft/Phi-3.5-vision-instruct in models folder

	[https://huggingface.co/microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)



	4. Please download these files to Your Phi-3.5-vision-instruct folder

	https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/resolve/main/onnx/config.json

	https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/image_embedding_phi3_v_for_onnx.py

	https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/modeling_phi3_v.py


	5. Download this file to models folder

	https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/build.py

	6. Go to terminal



	Convert ONNX support with FP32

	```bash

	python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp32 -p f32 -e cpu

	```



	Runing it with ORT for GenAI


	```python

	import onnxruntime_genai as og

	model_path = './Your Phi-3.5-vision-instruct Path'

	# Define the path to the image file
	# This path points to an image file that will be used for demonstration or testing
	img_path = './Your Image Path'


	# Create an instance of the Model class from the onnxruntime_genai module
	# This instance is initialized with the path to the model file
	model = og.Model(model_path)

	# Create a multimodal processor using the model instance
	# This processor will handle different types of input data (e.g., text, images)
	processor = model.create_multimodal_processor()

	# Create a stream for tokenizing input data using the processor
	# This stream will be used to process and tokenize the input data for the model
	tokenizer_stream = processor.create_stream()

	text = "Your Prompt"

	# Initialize a string variable for the prompt with a user tag
	prompt = "<\|user\|>\n"

	# Append an image tag to the prompt
	prompt += "<\|image_1\|>\n"

	# Append the text prompt to the prompt string, followed by an end tag
	prompt += f"{text}<\|end\|>\n"

	# Append an assistant tag to the prompt, indicating the start of the assistant's response
	prompt += "<\|assistant\|>\n"

	image = og.Images.open(img_path)

	inputs = processor(prompt, images=image)

	# Create an instance of the GeneratorParams class from the onnxruntime_genai module
	# This instance is initialized with the model object
	params = og.GeneratorParams(model)

	# Set the inputs for the generator parameters using the processed inputs
	params.set_inputs(inputs)

	# Set the search options for the generator parameters
	# The max_length parameter specifies the maximum length of the generated output
	params.set_search_options(max_length=3072)

	generator = og.Generator(model, params)

	# Loop until the generator has finished generating tokens
	while not generator.is_done():
	# Compute the logits (probabilities) for the next token
	generator.compute_logits()

	# Generate the next token based on the computed logits
	generator.generate_next_token()

	# Retrieve the newly generated token
	new_token = generator.get_next_tokens()[0]

	# Decode the new token and append it to the code string
	code += tokenizer_stream.decode(new_token)

	# Print the decoded token to the console without a newline, and flush the output buffer
	print(tokenizer_stream.decode(new_token), end='', flush=True)

	```