|
--- |
|
license: mit |
|
--- |
|
|
|
# **Phi-3.5-vision-instruct-onnx-cpu** |
|
|
|
<b><ul>Note: This is unoffical version,just for test and dev.</ul></b> |
|
|
|
This is the ONNX format FP32 quantized version of the Microsoft Phi-3.5 Vision with GPU. You can use run this script to convert |
|
|
|
|
|
**Convert Step by step** |
|
|
|
1. Installation |
|
|
|
```bash |
|
|
|
pip install torch transformers onnx onnxruntime |
|
|
|
pip install --pre onnxruntime-genai |
|
|
|
``` |
|
|
|
2. Set environment in terminal |
|
|
|
|
|
```bash |
|
|
|
mkdir models |
|
|
|
cd models |
|
|
|
``` |
|
|
|
|
|
|
|
3. Download **microsoft/Phi-3.5-vision-instruct** in models folder |
|
|
|
[https://huggingface.co/microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct) |
|
|
|
|
|
|
|
4. Please download these files to Your Phi-3.5-vision-instruct folder |
|
|
|
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/resolve/main/onnx/config.json |
|
|
|
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/image_embedding_phi3_v_for_onnx.py |
|
|
|
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/modeling_phi3_v.py |
|
|
|
|
|
5. Download this file to models folder |
|
|
|
https://huggingface.co/lokinfey/Phi-3.5-vision-instruct-onnx-cpu/blob/main/onnx/build.py |
|
|
|
6. Go to terminal |
|
|
|
|
|
|
|
Convert ONNX support with FP32 |
|
|
|
```bash |
|
|
|
python build.py -i .\Your Phi-3.5-vision-instruct Path\ -o .\vision-cpu-fp32 -p f32 -e cpu |
|
|
|
``` |
|
|
|
|
|
|
|
**Runing it with ORT for GenAI** |
|
|
|
|
|
```python |
|
|
|
import onnxruntime_genai as og |
|
|
|
model_path = './Your Phi-3.5-vision-instruct Path' |
|
|
|
# Define the path to the image file |
|
# This path points to an image file that will be used for demonstration or testing |
|
img_path = './Your Image Path' |
|
|
|
|
|
# Create an instance of the Model class from the onnxruntime_genai module |
|
# This instance is initialized with the path to the model file |
|
model = og.Model(model_path) |
|
|
|
# Create a multimodal processor using the model instance |
|
# This processor will handle different types of input data (e.g., text, images) |
|
processor = model.create_multimodal_processor() |
|
|
|
# Create a stream for tokenizing input data using the processor |
|
# This stream will be used to process and tokenize the input data for the model |
|
tokenizer_stream = processor.create_stream() |
|
|
|
text = "Your Prompt" |
|
|
|
# Initialize a string variable for the prompt with a user tag |
|
prompt = "<|user|>\n" |
|
|
|
# Append an image tag to the prompt |
|
prompt += "<|image_1|>\n" |
|
|
|
# Append the text prompt to the prompt string, followed by an end tag |
|
prompt += f"{text}<|end|>\n" |
|
|
|
# Append an assistant tag to the prompt, indicating the start of the assistant's response |
|
prompt += "<|assistant|>\n" |
|
|
|
image = og.Images.open(img_path) |
|
|
|
inputs = processor(prompt, images=image) |
|
|
|
# Create an instance of the GeneratorParams class from the onnxruntime_genai module |
|
# This instance is initialized with the model object |
|
params = og.GeneratorParams(model) |
|
|
|
# Set the inputs for the generator parameters using the processed inputs |
|
params.set_inputs(inputs) |
|
|
|
# Set the search options for the generator parameters |
|
# The max_length parameter specifies the maximum length of the generated output |
|
params.set_search_options(max_length=3072) |
|
|
|
generator = og.Generator(model, params) |
|
|
|
# Loop until the generator has finished generating tokens |
|
while not generator.is_done(): |
|
# Compute the logits (probabilities) for the next token |
|
generator.compute_logits() |
|
|
|
# Generate the next token based on the computed logits |
|
generator.generate_next_token() |
|
|
|
# Retrieve the newly generated token |
|
new_token = generator.get_next_tokens()[0] |
|
|
|
# Decode the new token and append it to the code string |
|
code += tokenizer_stream.decode(new_token) |
|
|
|
# Print the decoded token to the console without a newline, and flush the output buffer |
|
print(tokenizer_stream.decode(new_token), end='', flush=True) |
|
|
|
``` |
|
|
|
|
|
|
|
|