Mergekit source
This is really cool! This is the perfect example of what I’ve been wanting to execute the past month. I was wondering if you’re going to open-source the modified mergekit code. Another thing, I would love to collaborate with you. We could use TPUs to train more expert parameters to juice out more performance. Looking forward to hearing back from you.
I'd enjoy that! DM me on twitter and we can figure out how we want to proceed with that. I absolutely will be sharing the mergekit changes, once I can streamline the process. Will likely share that tomorrow. Looking forward to hearing from you.
Wow, I agree with
@Locutusque
, I think it's great to be able to implement a 'moe' with gemma.
thank you so much
By the way, I am participating in 'https://huggingface.co/somosnlp' (Spanish datasets) and I would like to be able to use the mergekit, I hope I am not being rude with this question, more or less in what time could you release the code with the instructions ?
thank you so much.
It will be released sometime this afternoon. I have a few kinks I need to workout to make it more streamlined, and I'm clearing that up now.