What coding dataset was used to train this model?
What coding dataset was used to train this model?
Also if you are interested I have 2 datasets for code training if you wanted to make more models.
One more only that may lead to loss of logical function:
https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k
And one that is meant to be lossless and provide coding function:
https://huggingface.co/datasets/rombodawg/LosslessMegaCodeTrainingV2_1m_Evol_Uncensored
Lets talk about it, i am interested.
I have used the dataset in my profile. 122k
I have checked your datasets you should convert them to llama2 format like mine. Convert them, add my dataset and create a new dataset from all, then i can fine tune it as soon as possible.
How did you create you 122k dataset? Was it created using gpt-4 prompting? Or was it sourced from somewhere on huggingface?
emre/llama-2-instruct-121k-code
I took it from another repo