from 3rd_ckpt the model was trained using 20million tokens not 30million tokens.