bigscience-bot commited on
Commit
be527c9
·
1 Parent(s): 5d36908
Files changed (1) hide show
  1. logs/main_log.txt +66 -0
logs/main_log.txt CHANGED
@@ -116234,3 +116234,69 @@ time (ms)
116234
  time (ms)
116235
  iteration 3107/ 292968 | consumed samples: 6363136 | consumed tokens: 927023104 | elapsed time per iteration (ms): 110093.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.565519E+00 | loss scale: 131072.0 | grad norm: 33777.532 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116236
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116234
  time (ms)
116235
  iteration 3107/ 292968 | consumed samples: 6363136 | consumed tokens: 927023104 | elapsed time per iteration (ms): 110093.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.565519E+00 | loss scale: 131072.0 | grad norm: 33777.532 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116236
  time (ms)
116237
+ iteration 3108/ 292968 | consumed samples: 6365184 | consumed tokens: 927498240 | elapsed time per iteration (ms): 110410.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.537144E+00 | loss scale: 131072.0 | grad norm: 40416.615 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116238
+ time (ms)
116239
+ iteration 3109/ 292968 | consumed samples: 6367232 | consumed tokens: 927973376 | elapsed time per iteration (ms): 111377.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527619E+00 | loss scale: 131072.0 | grad norm: 46857.043 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116240
+ time (ms)
116241
+ iteration 3110/ 292968 | consumed samples: 6369280 | consumed tokens: 928448512 | elapsed time per iteration (ms): 110496.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.555134E+00 | loss scale: 131072.0 | grad norm: 60420.377 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116242
+ time (ms)
116243
+ iteration 3111/ 292968 | consumed samples: 6371328 | consumed tokens: 928923648 | elapsed time per iteration (ms): 111479.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.542067E+00 | loss scale: 131072.0 | grad norm: 47053.293 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116244
+ time (ms)
116245
+ iteration 3112/ 292968 | consumed samples: 6373376 | consumed tokens: 929398784 | elapsed time per iteration (ms): 109535.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.538254E+00 | loss scale: 131072.0 | grad norm: 41897.336 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116246
+ time (ms)
116247
+ iteration 3113/ 292968 | consumed samples: 6375424 | consumed tokens: 929873920 | elapsed time per iteration (ms): 112139.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.540478E+00 | loss scale: 131072.0 | grad norm: 43233.715 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116248
+ time (ms)
116249
+ iteration 3114/ 292968 | consumed samples: 6377472 | consumed tokens: 930349056 | elapsed time per iteration (ms): 109783.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.507294E+00 | loss scale: 131072.0 | grad norm: 38971.265 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116250
+ time (ms)
116251
+ iteration 3115/ 292968 | consumed samples: 6379520 | consumed tokens: 930824192 | elapsed time per iteration (ms): 109544.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527198E+00 | loss scale: 131072.0 | grad norm: 39431.657 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116252
+ time (ms)
116253
+ iteration 3116/ 292968 | consumed samples: 6381568 | consumed tokens: 931299328 | elapsed time per iteration (ms): 109791.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.543795E+00 | loss scale: 131072.0 | grad norm: 35911.906 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116254
+ time (ms)
116255
+ iteration 3117/ 292968 | consumed samples: 6383616 | consumed tokens: 931774464 | elapsed time per iteration (ms): 109068.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.553530E+00 | loss scale: 131072.0 | grad norm: 31794.593 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116256
+ time (ms)
116257
+ iteration 3118/ 292968 | consumed samples: 6385664 | consumed tokens: 932249600 | elapsed time per iteration (ms): 111130.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.521324E+00 | loss scale: 131072.0 | grad norm: 37780.759 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116258
+ time (ms)
116259
+ iteration 3119/ 292968 | consumed samples: 6387712 | consumed tokens: 932724736 | elapsed time per iteration (ms): 110038.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.533055E+00 | loss scale: 131072.0 | grad norm: 36496.675 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116260
+ time (ms)
116261
+ iteration 3120/ 292968 | consumed samples: 6389760 | consumed tokens: 933199872 | elapsed time per iteration (ms): 110677.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.529382E+00 | loss scale: 131072.0 | grad norm: 35531.822 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116262
+ time (ms)
116263
+ iteration 3121/ 292968 | consumed samples: 6391808 | consumed tokens: 933675008 | elapsed time per iteration (ms): 111492.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.548238E+00 | loss scale: 131072.0 | grad norm: 44060.029 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116264
+ time (ms)
116265
+ iteration 3122/ 292968 | consumed samples: 6393856 | consumed tokens: 934150144 | elapsed time per iteration (ms): 110377.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.546454E+00 | loss scale: 131072.0 | grad norm: 50136.311 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116266
+ time (ms)
116267
+ iteration 3123/ 292968 | consumed samples: 6395904 | consumed tokens: 934625280 | elapsed time per iteration (ms): 110624.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.544405E+00 | loss scale: 131072.0 | grad norm: 58389.993 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116268
+ time (ms)
116269
+ iteration 3124/ 292968 | consumed samples: 6397952 | consumed tokens: 935100416 | elapsed time per iteration (ms): 109969.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.526251E+00 | loss scale: 131072.0 | grad norm: 46223.258 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116270
+ time (ms)
116271
+ iteration 3125/ 292968 | consumed samples: 6400000 | consumed tokens: 935575552 | elapsed time per iteration (ms): 110453.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.521111E+00 | loss scale: 131072.0 | grad norm: 47758.541 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116272
+ time (ms)
116273
+ iteration 3126/ 292968 | consumed samples: 6402048 | consumed tokens: 936050688 | elapsed time per iteration (ms): 110497.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.529240E+00 | loss scale: 131072.0 | grad norm: 43012.626 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116274
+ time (ms)
116275
+ iteration 3127/ 292968 | consumed samples: 6404096 | consumed tokens: 936525824 | elapsed time per iteration (ms): 109904.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.557906E+00 | loss scale: 131072.0 | grad norm: 38612.784 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116276
+ time (ms)
116277
+ iteration 3128/ 292968 | consumed samples: 6406144 | consumed tokens: 937000960 | elapsed time per iteration (ms): 108913.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.531578E+00 | loss scale: 131072.0 | grad norm: 41133.483 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116278
+ time (ms)
116279
+ iteration 3129/ 292968 | consumed samples: 6408192 | consumed tokens: 937476096 | elapsed time per iteration (ms): 110361.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.493152E+00 | loss scale: 131072.0 | grad norm: 37207.043 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116280
+ time (ms)
116281
+ iteration 3130/ 292968 | consumed samples: 6410240 | consumed tokens: 937951232 | elapsed time per iteration (ms): 110191.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.501123E+00 | loss scale: 131072.0 | grad norm: 38164.113 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116282
+ time (ms)
116283
+ iteration 3131/ 292968 | consumed samples: 6412288 | consumed tokens: 938426368 | elapsed time per iteration (ms): 111607.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.532626E+00 | loss scale: 131072.0 | grad norm: 33127.285 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116284
+ time (ms)
116285
+ iteration 3132/ 292968 | consumed samples: 6414336 | consumed tokens: 938901504 | elapsed time per iteration (ms): 109170.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.537379E+00 | loss scale: 131072.0 | grad norm: 34759.150 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116286
+ time (ms)
116287
+ iteration 3133/ 292968 | consumed samples: 6416384 | consumed tokens: 939376640 | elapsed time per iteration (ms): 110333.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.516551E+00 | loss scale: 131072.0 | grad norm: 34224.005 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116288
+ time (ms)
116289
+ iteration 3134/ 292968 | consumed samples: 6418432 | consumed tokens: 939851776 | elapsed time per iteration (ms): 111757.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.518162E+00 | loss scale: 131072.0 | grad norm: 36712.897 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116290
+ time (ms)
116291
+ iteration 3135/ 292968 | consumed samples: 6420480 | consumed tokens: 940326912 | elapsed time per iteration (ms): 110838.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.531169E+00 | loss scale: 131072.0 | grad norm: 45871.157 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116292
+ time (ms)
116293
+ iteration 3136/ 292968 | consumed samples: 6422528 | consumed tokens: 940802048 | elapsed time per iteration (ms): 111400.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.530567E+00 | loss scale: 131072.0 | grad norm: 40914.375 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116294
+ time (ms)
116295
+ iteration 3137/ 292968 | consumed samples: 6424576 | consumed tokens: 941277184 | elapsed time per iteration (ms): 110125.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.528954E+00 | loss scale: 131072.0 | grad norm: 46189.970 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116296
+ time (ms)
116297
+ iteration 3138/ 292968 | consumed samples: 6426624 | consumed tokens: 941752320 | elapsed time per iteration (ms): 110514.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.515621E+00 | loss scale: 131072.0 | grad norm: 47886.409 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116298
+ time (ms)
116299
+ iteration 3139/ 292968 | consumed samples: 6428672 | consumed tokens: 942227456 | elapsed time per iteration (ms): 111383.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.544991E+00 | loss scale: 131072.0 | grad norm: 44561.397 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116300
+ time (ms)
116301
+ iteration 3140/ 292968 | consumed samples: 6430720 | consumed tokens: 942702592 | elapsed time per iteration (ms): 109945.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.550672E+00 | loss scale: 131072.0 | grad norm: 55870.403 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
116302
+ time (ms)