bigscience-bot commited on
Commit
54e1d96
·
1 Parent(s): 2e8a204
Files changed (1) hide show
  1. logs/main_log.txt +66 -0
logs/main_log.txt CHANGED
@@ -116168,3 +116168,69 @@ time (ms)
116168
  time (ms)
116169
  iteration 3074/ 292968 | consumed samples: 6295552 | consumed tokens: 911343616 | elapsed time per iteration (ms): 111783.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.556693E+00 | loss scale: 262144.0 | grad norm: 75327.724 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116170
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116168
  time (ms)
116169
  iteration 3074/ 292968 | consumed samples: 6295552 | consumed tokens: 911343616 | elapsed time per iteration (ms): 111783.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.556693E+00 | loss scale: 262144.0 | grad norm: 75327.724 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116170
  time (ms)
116171
+ iteration 3075/ 292968 | consumed samples: 6297600 | consumed tokens: 911818752 | elapsed time per iteration (ms): 112802.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.532203E+00 | loss scale: 262144.0 | grad norm: 77808.092 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116172
+ time (ms)
116173
+ iteration 3076/ 292968 | consumed samples: 6299648 | consumed tokens: 912293888 | elapsed time per iteration (ms): 111886.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.536798E+00 | loss scale: 262144.0 | grad norm: 69181.190 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116174
+ time (ms)
116175
+ iteration 3077/ 292968 | consumed samples: 6301696 | consumed tokens: 912769024 | elapsed time per iteration (ms): 109788.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.529448E+00 | loss scale: 262144.0 | grad norm: 98417.458 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116176
+ time (ms)
116177
+ iteration 3078/ 292968 | consumed samples: 6303744 | consumed tokens: 913244160 | elapsed time per iteration (ms): 109949.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.558207E+00 | loss scale: 262144.0 | grad norm: 144977.392 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116178
+ time (ms)
116179
+ iteration 3079/ 292968 | consumed samples: 6305792 | consumed tokens: 913719296 | elapsed time per iteration (ms): 111355.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.542839E+00 | loss scale: 262144.0 | grad norm: 144887.374 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116180
+ time (ms)
116181
+ iteration 3080/ 292968 | consumed samples: 6307840 | consumed tokens: 914194432 | elapsed time per iteration (ms): 110023.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.568634E+00 | loss scale: 262144.0 | grad norm: 92941.350 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116182
+ time (ms)
116183
+ iteration 3081/ 292968 | consumed samples: 6309888 | consumed tokens: 914669568 | elapsed time per iteration (ms): 112256.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.564021E+00 | loss scale: 262144.0 | grad norm: 92941.350 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116184
+ time (ms)
116185
+ iteration 3082/ 292968 | consumed samples: 6311936 | consumed tokens: 915144704 | elapsed time per iteration (ms): 112050.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.573154E+00 | loss scale: 131072.0 | grad norm: 92941.350 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116186
+ time (ms)
116187
+ iteration 3083/ 292968 | consumed samples: 6313984 | consumed tokens: 915619840 | elapsed time per iteration (ms): 110067.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.587767E+00 | loss scale: 131072.0 | grad norm: 111957.591 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116188
+ time (ms)
116189
+ iteration 3084/ 292968 | consumed samples: 6316032 | consumed tokens: 916094976 | elapsed time per iteration (ms): 110023.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.549948E+00 | loss scale: 131072.0 | grad norm: 97896.334 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116190
+ time (ms)
116191
+ iteration 3085/ 292968 | consumed samples: 6318080 | consumed tokens: 916570112 | elapsed time per iteration (ms): 112215.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.590836E+00 | loss scale: 131072.0 | grad norm: 69343.441 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116192
+ time (ms)
116193
+ iteration 3086/ 292968 | consumed samples: 6320128 | consumed tokens: 917045248 | elapsed time per iteration (ms): 109520.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.571056E+00 | loss scale: 131072.0 | grad norm: 63993.728 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116194
+ time (ms)
116195
+ iteration 3087/ 292968 | consumed samples: 6322176 | consumed tokens: 917520384 | elapsed time per iteration (ms): 110110.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.561241E+00 | loss scale: 131072.0 | grad norm: 59094.102 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116196
+ time (ms)
116197
+ iteration 3088/ 292968 | consumed samples: 6324224 | consumed tokens: 917995520 | elapsed time per iteration (ms): 110599.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.568187E+00 | loss scale: 131072.0 | grad norm: 48906.726 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116198
+ time (ms)
116199
+ iteration 3089/ 292968 | consumed samples: 6326272 | consumed tokens: 918470656 | elapsed time per iteration (ms): 111035.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.589565E+00 | loss scale: 131072.0 | grad norm: 49687.770 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116200
+ time (ms)
116201
+ iteration 3090/ 292968 | consumed samples: 6328320 | consumed tokens: 918945792 | elapsed time per iteration (ms): 110708.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.530910E+00 | loss scale: 131072.0 | grad norm: 40254.463 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116202
+ time (ms)
116203
+ iteration 3091/ 292968 | consumed samples: 6330368 | consumed tokens: 919420928 | elapsed time per iteration (ms): 110329.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.551384E+00 | loss scale: 131072.0 | grad norm: 42286.336 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116204
+ time (ms)
116205
+ iteration 3092/ 292968 | consumed samples: 6332416 | consumed tokens: 919896064 | elapsed time per iteration (ms): 111680.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.581002E+00 | loss scale: 131072.0 | grad norm: 33542.876 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116206
+ time (ms)
116207
+ iteration 3093/ 292968 | consumed samples: 6334464 | consumed tokens: 920371200 | elapsed time per iteration (ms): 111080.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.548149E+00 | loss scale: 131072.0 | grad norm: 37645.822 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116208
+ time (ms)
116209
+ iteration 3094/ 292968 | consumed samples: 6336512 | consumed tokens: 920846336 | elapsed time per iteration (ms): 114983.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.561650E+00 | loss scale: 131072.0 | grad norm: 45264.420 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116210
+ time (ms)
116211
+ iteration 3095/ 292968 | consumed samples: 6338560 | consumed tokens: 921321472 | elapsed time per iteration (ms): 112957.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.530528E+00 | loss scale: 131072.0 | grad norm: 59561.033 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116212
+ time (ms)
116213
+ iteration 3096/ 292968 | consumed samples: 6340608 | consumed tokens: 921796608 | elapsed time per iteration (ms): 111460.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.529258E+00 | loss scale: 131072.0 | grad norm: 36811.120 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116214
+ time (ms)
116215
+ iteration 3097/ 292968 | consumed samples: 6342656 | consumed tokens: 922271744 | elapsed time per iteration (ms): 111452.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.556875E+00 | loss scale: 131072.0 | grad norm: 31223.968 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116216
+ time (ms)
116217
+ iteration 3098/ 292968 | consumed samples: 6344704 | consumed tokens: 922746880 | elapsed time per iteration (ms): 110963.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.560329E+00 | loss scale: 131072.0 | grad norm: 43357.196 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116218
+ time (ms)
116219
+ iteration 3099/ 292968 | consumed samples: 6346752 | consumed tokens: 923222016 | elapsed time per iteration (ms): 109338.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.528105E+00 | loss scale: 131072.0 | grad norm: 55024.526 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116220
+ time (ms)
116221
+ iteration 3100/ 292968 | consumed samples: 6348800 | consumed tokens: 923697152 | elapsed time per iteration (ms): 109711.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.541577E+00 | loss scale: 131072.0 | grad norm: 37820.128 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116222
+ time (ms)
116223
+ iteration 3101/ 292968 | consumed samples: 6350848 | consumed tokens: 924172288 | elapsed time per iteration (ms): 109829.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.521191E+00 | loss scale: 131072.0 | grad norm: 38476.139 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116224
+ time (ms)
116225
+ iteration 3102/ 292968 | consumed samples: 6352896 | consumed tokens: 924647424 | elapsed time per iteration (ms): 109162.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.541833E+00 | loss scale: 131072.0 | grad norm: 42999.338 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116226
+ time (ms)
116227
+ iteration 3103/ 292968 | consumed samples: 6354944 | consumed tokens: 925122560 | elapsed time per iteration (ms): 109209.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527602E+00 | loss scale: 131072.0 | grad norm: 41025.070 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116228
+ time (ms)
116229
+ iteration 3104/ 292968 | consumed samples: 6356992 | consumed tokens: 925597696 | elapsed time per iteration (ms): 108775.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.519634E+00 | loss scale: 131072.0 | grad norm: 31961.766 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116230
+ time (ms)
116231
+ iteration 3105/ 292968 | consumed samples: 6359040 | consumed tokens: 926072832 | elapsed time per iteration (ms): 109808.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.545305E+00 | loss scale: 131072.0 | grad norm: 41148.398 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116232
+ time (ms)
116233
+ iteration 3106/ 292968 | consumed samples: 6361088 | consumed tokens: 926547968 | elapsed time per iteration (ms): 109343.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.534677E+00 | loss scale: 131072.0 | grad norm: 33930.951 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116234
+ time (ms)
116235
+ iteration 3107/ 292968 | consumed samples: 6363136 | consumed tokens: 927023104 | elapsed time per iteration (ms): 110093.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.565519E+00 | loss scale: 131072.0 | grad norm: 33777.532 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116236
+ time (ms)