--- library_name: peft base_model: Qwen/Qwen-VL-Chat-Int4 --- # Model Card for Model ID ## Model Details ### Model Description - **Developed by:** Agora Research - **Model type:** Vision Language Model - **Language(s) (NLP):** English/Chinese - **Finetuned from model:** Qwen-VL ### Model Sources [optional] - **Repository:** https://github.com/QwenLM/Qwen-VL - **Paper:** https://arxiv.org/pdf/2308.12966.pdf ## Uses ``` import peft from peft import AutoPeftModelForCausalLM from transformers import AutoTokenizer from transformers.generation import GenerationConfig ``` # Note: The default behavior now has injection attack prevention off. ``` tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-VL",trust_remote_code=True) model = AutoPeftModelForCausalLM.from_pretrained( "Qwen-VL-FNCall-qlora/", # path to the output directory device_map="cuda", fp16=True, trust_remote_code=True ).eval() ``` # Specify hyperparameters for generation (generation_config if transformers < 4.32.0) ``` #model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True) # 1st dialogue turn query = tokenizer.from_list_format([ {'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url {'text': "[FUNCTION CALL]"}, ]) print("sending model to chat") response, history = model.chat(tokenizer, query=query, history=None) print(response) ``` # Print Results ``` [FUNCTION CALL] {{ 'type': 'object', 'properties': {{ 'puppy_colors': {{ 'type': 'array', 'description': 'The colors of the puppies in the image.', 'items': {{ 'type': 'string', 'enum': ['golden'] }} }}, 'puppy_posture': {{ 'type': 'string', 'description': 'The posture of the puppies in the image.', 'enum': ['sitting'] }}, 'puppy_expression': {{ 'type': 'string', 'description': 'The expression of the puppies in the image.', 'enum': ['smiling'] }}, 'puppy_location': {{ 'type': 'string', 'description': 'The location of the puppies in the image.', 'enum': ['on a green field with orange flowers'] }}, 'puppy_background': {{ 'type': 'string', 'description': 'The background of the puppies in the image.', 'enum': ['green field with orange flowers'] }} }} }} [EXPECTED OUTPUT] {{ 'puppy_colors': ['golden'], 'puppy_posture': 'sitting', 'puppy_expression': 'smiling', 'puppy_location': 'on a green field with orange flowers', 'puppy_background': 'green field with orange flowers' }} ``` ### Direct Use Just send an image and put [FUNCTION CALL] in the text. Can also be used for normal qwenvl inference. ### Recommendations (recommended) transformers >= 4.32.0 ## How to Get Started with the Model ``` query = tokenizer.from_list_format([ {'image': 'https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIzLTA4L3Jhd3BpeGVsX29mZmljZV8xNV9waG90b19vZl9hX2RvZ19ydW5uaW5nX3dpdGhfb3duZXJfYXRfcGFya19lcF9mM2I3MDQyZC0zNWJlLTRlMTQtOGZhNy1kY2Q2OWQ1YzQzZjlfMi5qcGc.jpg'}, # Either a local path or an url {'text': "[FUNCTION CALL]"}, ]) ``` ## Training Details ### Training Data https://huggingface.co/datasets/AgoraX/OpenImage-FNCall-50k ### Training Procedure qlora for 1 epoch, 1000 steps ### Framework versions - PEFT 0.7.1