bigscience-bot
commited on
Commit
·
54e1d96
1
Parent(s):
2e8a204
new data
Browse files- logs/main_log.txt +66 -0
logs/main_log.txt
CHANGED
@@ -116168,3 +116168,69 @@ time (ms)
|
|
116168 |
time (ms)
|
116169 |
iteration 3074/ 292968 | consumed samples: 6295552 | consumed tokens: 911343616 | elapsed time per iteration (ms): 111783.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.556693E+00 | loss scale: 262144.0 | grad norm: 75327.724 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116170 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116168 |
time (ms)
|
116169 |
iteration 3074/ 292968 | consumed samples: 6295552 | consumed tokens: 911343616 | elapsed time per iteration (ms): 111783.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.556693E+00 | loss scale: 262144.0 | grad norm: 75327.724 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116170 |
time (ms)
|
116171 |
+
iteration 3075/ 292968 | consumed samples: 6297600 | consumed tokens: 911818752 | elapsed time per iteration (ms): 112802.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.532203E+00 | loss scale: 262144.0 | grad norm: 77808.092 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116172 |
+
time (ms)
|
116173 |
+
iteration 3076/ 292968 | consumed samples: 6299648 | consumed tokens: 912293888 | elapsed time per iteration (ms): 111886.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.536798E+00 | loss scale: 262144.0 | grad norm: 69181.190 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116174 |
+
time (ms)
|
116175 |
+
iteration 3077/ 292968 | consumed samples: 6301696 | consumed tokens: 912769024 | elapsed time per iteration (ms): 109788.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.529448E+00 | loss scale: 262144.0 | grad norm: 98417.458 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116176 |
+
time (ms)
|
116177 |
+
iteration 3078/ 292968 | consumed samples: 6303744 | consumed tokens: 913244160 | elapsed time per iteration (ms): 109949.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.558207E+00 | loss scale: 262144.0 | grad norm: 144977.392 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116178 |
+
time (ms)
|
116179 |
+
iteration 3079/ 292968 | consumed samples: 6305792 | consumed tokens: 913719296 | elapsed time per iteration (ms): 111355.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.542839E+00 | loss scale: 262144.0 | grad norm: 144887.374 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116180 |
+
time (ms)
|
116181 |
+
iteration 3080/ 292968 | consumed samples: 6307840 | consumed tokens: 914194432 | elapsed time per iteration (ms): 110023.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.568634E+00 | loss scale: 262144.0 | grad norm: 92941.350 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116182 |
+
time (ms)
|
116183 |
+
iteration 3081/ 292968 | consumed samples: 6309888 | consumed tokens: 914669568 | elapsed time per iteration (ms): 112256.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.564021E+00 | loss scale: 262144.0 | grad norm: 92941.350 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116184 |
+
time (ms)
|
116185 |
+
iteration 3082/ 292968 | consumed samples: 6311936 | consumed tokens: 915144704 | elapsed time per iteration (ms): 112050.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.573154E+00 | loss scale: 131072.0 | grad norm: 92941.350 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116186 |
+
time (ms)
|
116187 |
+
iteration 3083/ 292968 | consumed samples: 6313984 | consumed tokens: 915619840 | elapsed time per iteration (ms): 110067.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.587767E+00 | loss scale: 131072.0 | grad norm: 111957.591 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116188 |
+
time (ms)
|
116189 |
+
iteration 3084/ 292968 | consumed samples: 6316032 | consumed tokens: 916094976 | elapsed time per iteration (ms): 110023.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.549948E+00 | loss scale: 131072.0 | grad norm: 97896.334 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116190 |
+
time (ms)
|
116191 |
+
iteration 3085/ 292968 | consumed samples: 6318080 | consumed tokens: 916570112 | elapsed time per iteration (ms): 112215.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.590836E+00 | loss scale: 131072.0 | grad norm: 69343.441 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116192 |
+
time (ms)
|
116193 |
+
iteration 3086/ 292968 | consumed samples: 6320128 | consumed tokens: 917045248 | elapsed time per iteration (ms): 109520.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.571056E+00 | loss scale: 131072.0 | grad norm: 63993.728 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116194 |
+
time (ms)
|
116195 |
+
iteration 3087/ 292968 | consumed samples: 6322176 | consumed tokens: 917520384 | elapsed time per iteration (ms): 110110.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.561241E+00 | loss scale: 131072.0 | grad norm: 59094.102 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116196 |
+
time (ms)
|
116197 |
+
iteration 3088/ 292968 | consumed samples: 6324224 | consumed tokens: 917995520 | elapsed time per iteration (ms): 110599.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.568187E+00 | loss scale: 131072.0 | grad norm: 48906.726 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116198 |
+
time (ms)
|
116199 |
+
iteration 3089/ 292968 | consumed samples: 6326272 | consumed tokens: 918470656 | elapsed time per iteration (ms): 111035.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.589565E+00 | loss scale: 131072.0 | grad norm: 49687.770 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116200 |
+
time (ms)
|
116201 |
+
iteration 3090/ 292968 | consumed samples: 6328320 | consumed tokens: 918945792 | elapsed time per iteration (ms): 110708.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.530910E+00 | loss scale: 131072.0 | grad norm: 40254.463 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116202 |
+
time (ms)
|
116203 |
+
iteration 3091/ 292968 | consumed samples: 6330368 | consumed tokens: 919420928 | elapsed time per iteration (ms): 110329.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.551384E+00 | loss scale: 131072.0 | grad norm: 42286.336 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116204 |
+
time (ms)
|
116205 |
+
iteration 3092/ 292968 | consumed samples: 6332416 | consumed tokens: 919896064 | elapsed time per iteration (ms): 111680.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.581002E+00 | loss scale: 131072.0 | grad norm: 33542.876 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116206 |
+
time (ms)
|
116207 |
+
iteration 3093/ 292968 | consumed samples: 6334464 | consumed tokens: 920371200 | elapsed time per iteration (ms): 111080.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.548149E+00 | loss scale: 131072.0 | grad norm: 37645.822 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116208 |
+
time (ms)
|
116209 |
+
iteration 3094/ 292968 | consumed samples: 6336512 | consumed tokens: 920846336 | elapsed time per iteration (ms): 114983.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.561650E+00 | loss scale: 131072.0 | grad norm: 45264.420 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116210 |
+
time (ms)
|
116211 |
+
iteration 3095/ 292968 | consumed samples: 6338560 | consumed tokens: 921321472 | elapsed time per iteration (ms): 112957.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.530528E+00 | loss scale: 131072.0 | grad norm: 59561.033 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116212 |
+
time (ms)
|
116213 |
+
iteration 3096/ 292968 | consumed samples: 6340608 | consumed tokens: 921796608 | elapsed time per iteration (ms): 111460.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.529258E+00 | loss scale: 131072.0 | grad norm: 36811.120 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116214 |
+
time (ms)
|
116215 |
+
iteration 3097/ 292968 | consumed samples: 6342656 | consumed tokens: 922271744 | elapsed time per iteration (ms): 111452.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.556875E+00 | loss scale: 131072.0 | grad norm: 31223.968 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116216 |
+
time (ms)
|
116217 |
+
iteration 3098/ 292968 | consumed samples: 6344704 | consumed tokens: 922746880 | elapsed time per iteration (ms): 110963.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.560329E+00 | loss scale: 131072.0 | grad norm: 43357.196 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116218 |
+
time (ms)
|
116219 |
+
iteration 3099/ 292968 | consumed samples: 6346752 | consumed tokens: 923222016 | elapsed time per iteration (ms): 109338.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.528105E+00 | loss scale: 131072.0 | grad norm: 55024.526 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116220 |
+
time (ms)
|
116221 |
+
iteration 3100/ 292968 | consumed samples: 6348800 | consumed tokens: 923697152 | elapsed time per iteration (ms): 109711.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.541577E+00 | loss scale: 131072.0 | grad norm: 37820.128 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116222 |
+
time (ms)
|
116223 |
+
iteration 3101/ 292968 | consumed samples: 6350848 | consumed tokens: 924172288 | elapsed time per iteration (ms): 109829.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.521191E+00 | loss scale: 131072.0 | grad norm: 38476.139 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116224 |
+
time (ms)
|
116225 |
+
iteration 3102/ 292968 | consumed samples: 6352896 | consumed tokens: 924647424 | elapsed time per iteration (ms): 109162.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.541833E+00 | loss scale: 131072.0 | grad norm: 42999.338 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116226 |
+
time (ms)
|
116227 |
+
iteration 3103/ 292968 | consumed samples: 6354944 | consumed tokens: 925122560 | elapsed time per iteration (ms): 109209.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527602E+00 | loss scale: 131072.0 | grad norm: 41025.070 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116228 |
+
time (ms)
|
116229 |
+
iteration 3104/ 292968 | consumed samples: 6356992 | consumed tokens: 925597696 | elapsed time per iteration (ms): 108775.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.519634E+00 | loss scale: 131072.0 | grad norm: 31961.766 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116230 |
+
time (ms)
|
116231 |
+
iteration 3105/ 292968 | consumed samples: 6359040 | consumed tokens: 926072832 | elapsed time per iteration (ms): 109808.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.545305E+00 | loss scale: 131072.0 | grad norm: 41148.398 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116232 |
+
time (ms)
|
116233 |
+
iteration 3106/ 292968 | consumed samples: 6361088 | consumed tokens: 926547968 | elapsed time per iteration (ms): 109343.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.534677E+00 | loss scale: 131072.0 | grad norm: 33930.951 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116234 |
+
time (ms)
|
116235 |
+
iteration 3107/ 292968 | consumed samples: 6363136 | consumed tokens: 927023104 | elapsed time per iteration (ms): 110093.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.565519E+00 | loss scale: 131072.0 | grad norm: 33777.532 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116236 |
+
time (ms)
|