details?
Thanks for sharing, i've been looking for some info regarding these mmprojector files for VLMs.
1.Why is this file not provided in any original repo of these VLMs?
How to know which mmprojector file is best suitable for any VLM? as far as i read a few papers, different VLMs have different strategies of the architecture design that includes customisation of the multimodal projector.
As you mentioned latest version, is there any active discussion thread of what is being improved?
Thanks!
Thanks for sharing, i've been looking for some info regarding these mmprojector files for VLMs.
1.Why is this file not provided in any original repo of these VLMs?
How to know which mmprojector file is best suitable for any VLM? as far as i read a few papers, different VLMs have different strategies of the architecture design that includes customisation of the multimodal projector.
As you mentioned latest version, is there any active discussion thread of what is being improved?
Thanks!
Last few iterations before this were done with our modified configs for l3, but with the recent backend changes they were no longer necessary. As far as use case, goes since this was made from llava 1.5 training on the llama 3 8b instruct model - it should be compatible with any finetunes or merges based on that. Some of the other changes with our updated projectors include, additional finetuning adjustments in which version of clip was used. Which can affect image resolution sent the model and so forth. As far as the training itself goes that is being done by @weizhiwang , we at chaotic neutrals are simply extracting the multimodal tensors and converting it into a usable gguf format for common backends like kcpp and lcpp.
got it :)