The output image deviates significantly from the input image. No matter how you adjust the generation parameters, the resemblance just isn't there. The results are even worse than a direct alignment using SigLIP, what's the benefit of a LLM here?
· Sign up or log in to comment