FeynModel V 0.1
Welcome to the FeynModel repository, a Vision Language model with the reasoning capabilities of an LLM (Large Language Model). It aims to explore the combined power of vision and language for scientific reasoning tasks. This model is fine-tuned using the LoRA (Low-Rank Adaptation) method, optimizing it for enhanced performance in a variety of vision and language tasks.
Version 0.1 utilizes pretrained layers from the DaVit Vision Tower of Florence2-base (Microsoft) and Gemma2-2B (Google), and was fine-tuned on M3IT, COCO, and ScienceQA datasets. It employs an S6 block to integrate context memory for Q*TS (experimental).
how to use
from transformers import AutoProcessor, AutoModelForCausalLM
model_id='Imagroune/feynmodel'
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id,trust_remote_code=True)
model.to('cuda')
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
device_map='cpu'
,torch_dtype=torch.bfloat16
)
LLM Inference
input_text = "<start_of_turn>user\nCombien d'helicoptère un humain adulte peut manger en un seul repas?.<end_of_turn> <start_of_turn>model\n"
input_ids = processor.tokenizer(input_text, return_tensors="pt").to("cuda")
max_length = input_ids.input_ids.shape[1] + 1024
stream_output = []
for output in model.generate(input_ids=input_ids.input_ids,max_length=max_length, do_sample=True, temperature=0.7):
decoded_output = processor.tokenizer.decode(output, skip_special_tokens=True)
stream_output.append(decoded_output)
print(decoded_output, end="", flush=True)
it will output something like :
This is a trick question! Here's why:
* **Helicopters don't have food to eat.** Helicopters are machines that fly. They don't have mouths or stomachs!
* **Humans don't fly through food.** We eat food to give our bodies energy. But we don't eat food that we can fly through!
Let me know if you'd like to learn about how people eat different foods.
Vision Inference
from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList
class PrintTokensStoppingCriteria(StoppingCriteria):
def __init__(self, tokenizer):
self.tokenizer = tokenizer
def __call__(self, input_ids, scores, **kwargs):
last_token_id = input_ids[0, -1].item()
token = self.tokenizer.decode([last_token_id], skip_special_tokens=True)
print(token, end='', flush=True)
return False
stopping_criteria = PrintTokensStoppingCriteria(processor.tokenizer)
from PIL import Image
import requests
input_text = "<start_of_turn>user\n what is this ?<end_of_turn>\n<start_of_turn>model"
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)
input_text="""<start_of_turn>user
Create a concise caption that accurately describes the main elements in the image provided
<end_of_turn>
<start_of_turn>model
"""
inputs = processor(text=input_text, images=image, return_tensors="pt")
inputs = {key: value.cuda() for key, value in inputs.items()}
inputs = {key: value.to(dtype=model.dtype) if value.dtype == torch.float32 else value for key, value in inputs.items()}
image
max_length =inputs['input_ids'].shape[1] + 1024
stream_output = []
ret= model.generate(inputs['input_ids'], pixel_values=inputs['pixel_values'],stopping_criteria=StoppingCriteriaList([stopping_criteria]),max_length=2048, do_sample=True, temperature=0.7)