microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned Zero-Shot Classification • Updated 11 days ago • 10.2k • 25
GIT Collection GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. • 18 items • Updated Jul 11 • 10