Any comparison between the embed methods and adding pos/neg prompts?
#9
by
adi-kmt
- opened
Noticed that you had a phixtral model with cheap embed and no pos prompt https://huggingface.co/mlabonne/phixtral-4x2_8/discussions/6.
Do you notice hidden and adding pos prompts gives you better responses?
Also do you finetune after merging?
Yes, pos/neg prompts are a lot better to initialize the gating weights (phixtral's are random). I didn't fine-tune them because it's quite a tricky process and I haven't been successful with it so far, but in theory this should be done.