Dataset Size
#3
by
aslawliet
- opened
Can you tell me about the dataset size and sampling methods?
One of the datasets used to train this model, PLANE-2K, has a size of 2 thousand rows (1.8 megabytes). You can use pretty much any sampling method you need as long as you have the appropriate tools.
https://huggingface.co/datasets/Keynote-Technology/PLANE-2K
I meant the size of data you picked up from RedPajama-Data-v2?
The size that I used to train this model was close to 900,000 rows, a size equivalent to 4.41GB
I sampled randomly in no particular order.
PlanetDOGE
changed discussion status to
closed