Mistroll 3.0 Vision

Trained on LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Uploaded model

Developed by: BarraHome
License: apache-2.0
Finetuned from model : unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit

Example output:

Testing with sample question: Please answer the question below, explaining your reasoning step by step before providing the final answer.

Question:

Add 4 small gray matte things. How many small gray matte things are left?
<SUMMARY>The question asks to add 4 small gray matte objects to a scene containing an unknown number of small gray matte objects.  The solution involves identifying the objects in the provided image and then determining the total number of small gray matte objects after adding the additional 4.</SUMMARY>

<CAPTION>The image contains several 3D objects: a gray cylinder with a handle, a yellow cube, a green sphere, and a bronze sphere. The objects are arranged on a gray background.</CAPTION>

<REASONING>1. **Object Identification:**  A careful examination of the provided image reveals the following objects:
    * A gray cylinder with a handle.
    * A yellow cube.
    * A green sphere.
    * A bronze sphere.

2. **Counting Small Gray Matte Objects:** We need to count how many of the objects identified are small gray matte. The gray cylinder and the gray cube are both small gray matte. There are no other gray matte objects visible in the image.

3. **Adding 4 Small Gray Matte Objects:**  The question states that we need to add 4 small gray matte objects to the scene.  We will add these to the 2 small gray matte objects that we found in the image.

4. **Calculating the Total:** The total number of small gray matte objects will be the sum of the objects found in the image (2) and the additional objects added to the scene (4).

5. **Final Count:**  Therefore, after adding 4 small gray matte objects, there will be a total of 6 small gray matte objects.</REASONING>

<CONCLUSION>6</CONCLUSION>

This mllama model was trained 2x faster with Unsloth and Huggingface's TRL library.

BarraHome
/

Mistroll-3.0-CoT-Llama-3.2-11B-Vision-Instruct

Mistroll 3.0 Vision

Trained on LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Uploaded model

Model tree for BarraHome/Mistroll-3.0-CoT-Llama-3.2-11B-Vision-Instruct

Datasets used to train BarraHome/Mistroll-3.0-CoT-Llama-3.2-11B-Vision-Instruct