q-future
/

co-instruct

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Community

co-instruct / README.md

teowu's picture

Update README.md

4aea171 verified 10 months ago

|

3.48 kB

	## Performance

	### Low-level Question-Answering

	This model has reached 75.12\%(12\% better than previous version)/74.98\%(8.5\% better than previous version) on Q-Bench A1 dev/test (multi-choice questions).

	It also outperforms the following close-source models with much larger model capacities:

	\| Model \| dev \| test \|
	\| ---- \| ---- \| ---- \|
	\| Co-Instruct-Preview (mPLUG-Owl2) (This Model) \| 75.12\% \| 74.98\% \|
	\| \*GPT-4V-Turbo \| 74.41\% \| 74.10\% \|
	\| \Qwen-VL-Max* \| 73.63\% \| 73.90\% \|
	\| \*GPT-4V (Nov. 2023) \| 71.78\% \| 73.44\% \|
	\| \*Gemini-Pro \| 68.16\% \| 69.46\% \|
	\| Q-Instruct (mPLUG-Owl2, Nov. 2023) \| 67.42\% \| 70.43\% \|
	\| \*Qwen-VL-Plus \| 66.01\% \| 68.93\% \|
	\| mPLUG-Owl2 \| 62.14\% \| 62.68\% \|

	\*: Proprietary Models.

	#### Image/Video Quality Assessment

	\| Model \| live \| agi \| livec \| test_spaq \| csiq \| test_kadid \| test_koniq \| konvid \| maxwell_test \|
	\|--------------------------\|--------------\|--------------\|-------------\|-------------\|-------------\|-------------\|-------------\|-------------\|--------------\|
	\|Co-Instruct-Preview (mPLUG-Owl2) (This Model) \| 0.771/0.751 \| 0.727/0.749 \| 0.861/0.865 \| 0.946/0.938 \| 0.735/0.748 \| 0.782/0.770 \| 0.908/0.941 \| 0.818/0.790 \| 0.735/0.714 \|
	\| Q-Instruct (mPLUG-Owl2, Nov. 2023) \| 0.749/0.747 \| 0.710/0.753 \| 0.781/0.791 \| 0.921/0.917 \| 0.693/0.723 \| 0.670/0.665 \| 0.904/0.921 \| 0.766/0.738 \| 0.650/0.649 \|


	We are also constructing multi-image benchmark sets (image pairs, triple-quadruple images), and the results on multi-image benchmarks will be released soon!

	## Load Model

	```python
	import torch
	from transformers import AutoModelForCausalLM

	model = AutoModelForCausalLM.from_pretrained("q-future/co-instruct-preview",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	attn_implementation="flash_attention_2",
	device_map={"":"cuda:0"})
	```

	## Chat

	```python
	import requests
	from PIL import Image


	### Single Image
	prompt = "USER: The image: <\|image\|> Which happens in this image: motion-blur, over-exposure, or under-exposure? ASSISTANT:"
	url = "https://raw.githubusercontent.com/Q-Future/Q-Align/main/fig/singapore_flyer.jpg"
	image = Image.open(requests.get(url,stream=True).raw)
	model.chat(prompt, [image], max_new_tokens=200)

	## Motion blur

	### Double Image Comparison
	prompt_cmp = "USER: The first image: <\|image\|>\nThe second image: <\|image\|>Which image has better quality, and why? ASSISTANT:"
	url = "https://raw.githubusercontent.com/Q-Future/Q-Align/main/fig/boy_colorful.jpg"
	image_2 = Image.open(requests.get(url,stream=True).raw)
	model.chat(prompt_cmp, [image, image_2], max_new_tokens=200)

	## The second image has better quality. The description indicates that the image has accurate exposure, precise focus, clear details, rich colors, and sufficient lighting. Additionally, the texture details are clear, and the composition is centered. In comparison, the first image has good clarity and rich texture details, but the lighting is slightly weak, which can affect the overall quality of the image. Therefore, the second image is of higher quality due to its accurate exposure, precise focus, clear details, rich colors, sufficient lighting, and centered composition.

	```