What text can be generated?
Tried it, it doesn't generate Chinese, Japanese or Korean.
You are correct - this is the original CLIP ViT-L/14 model by OpenAI, which predominantly knows English (is only reliable to use with English). I just fine-tuned the model for higher accuracy (zero shot, retrieval, or as guidance / text encoder for generative AI). I did not train on non-English languages. However, the exact code I used to fine-tune the model (especially the Geometric Parametrization modification of the model) is available on my GitHub. You can adapt it to any Multi-lingual or non-English CLIP model - and 24 GB VRAM are sufficient for a good result, so you only need an RTX 3090 or similar to archive good results:
https://github.com/zer0int/CLIP-fine-tune
Although I have a 4090, I'm just a hobbyist and know nothing about modifying code. Thanks for your reply!