metadata
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
library_name: transformers
pipeline_tag: image-text-to-text
tags:
- facebook
- meta
- pytorch
- llama
- llama-3
This repository is a pre-release checkpoint for Llama 3.2 11B Vision Instruct.
It contains two versions of the model, for use with transformers
and with the original llama3
codebase (under the original
directory).
Inference with transformers
Please, install the in-progress development wheel from https://huggingface.co/nltpt/transformers/tree/main.
This is an example inference snippet (API subject to change):
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor
model_id = "nltpt/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(model_id)
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Describe image in two sentences"}
]
}
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)
url = "https://llava-vl.github.io/static/images/view.jpg"
raw_image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=text, images=raw_image, return_tensors="pt").to(model.device)
output = model.generate(**inputs, do_sample=False, max_new_tokens=25)
print(processor.decode(output[0]))
Output:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
<|image|>Describe image in two sentences<|eot_id|><|start_header_id|>assistant<|end_header_id|>
The image depicts a serene lake scene, featuring a long wooden dock extending into the calm water, with a dense forest of trees
Running the original checkpoints
The package installed will provide three binaries:
- example_chat_completion
- example_text_completion
- multimodal_example_chat_completion
You can invoke them via torchrun by doing the following:
CHECKPOINT_DIR=~/.llama/checkpoints/Llama-3.2-11B-Vision-Instruct/
torchrun `which multimodal_example_chat_completion` "$CHECKPOINT_DIR"
You can study the code for the script by doing something like:
PACKAGE_DIR=$(pip show -f llama-models | grep Location | awk '{ print $2 }')
echo "Scripts are in the directory: $PACKAGE_DIR/llama-models/scripts/"