What coding dataset was used to train this model?

by rombodawg - opened Aug 9, 2023

Aug 9, 2023

What coding dataset was used to train this model?

Also if you are interested I have 2 datasets for code training if you wanted to make more models.

One more only that may lead to loss of logical function:
https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k

And one that is meant to be lossless and provide coding function:
https://huggingface.co/datasets/rombodawg/LosslessMegaCodeTrainingV2_1m_Evol_Uncensored

emre

Owner Aug 9, 2023

Lets talk about it, i am interested.
I have used the dataset in my profile. 122k
I have checked your datasets you should convert them to llama2 format like mine. Convert them, add my dataset and create a new dataset from all, then i can fine tune it as soon as possible.

rombodawg

Aug 9, 2023

•

edited Aug 9, 2023

How did you create you 122k dataset? Was it created using gpt-4 prompting? Or was it sourced from somewhere on huggingface?

emre

Owner Aug 9, 2023

emre/llama-2-instruct-121k-code
I took it from another repo

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment