Abstract
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.
Community
also read hf.co/blog/paligemma
are the finetuned models going to be available on huggingface?
are the finetuned models going to be available on huggingface?
I think it is already available.
https://huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda
https://huggingface.co/collections/google/paligemma-ft-models-6643b03efb769dad650d2dda