bigscience-bot
commited on
Commit
·
be527c9
1
Parent(s):
5d36908
new data
Browse files- logs/main_log.txt +66 -0
logs/main_log.txt
CHANGED
@@ -116234,3 +116234,69 @@ time (ms)
|
|
116234 |
time (ms)
|
116235 |
iteration 3107/ 292968 | consumed samples: 6363136 | consumed tokens: 927023104 | elapsed time per iteration (ms): 110093.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.565519E+00 | loss scale: 131072.0 | grad norm: 33777.532 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116236 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116234 |
time (ms)
|
116235 |
iteration 3107/ 292968 | consumed samples: 6363136 | consumed tokens: 927023104 | elapsed time per iteration (ms): 110093.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.565519E+00 | loss scale: 131072.0 | grad norm: 33777.532 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116236 |
time (ms)
|
116237 |
+
iteration 3108/ 292968 | consumed samples: 6365184 | consumed tokens: 927498240 | elapsed time per iteration (ms): 110410.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.537144E+00 | loss scale: 131072.0 | grad norm: 40416.615 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116238 |
+
time (ms)
|
116239 |
+
iteration 3109/ 292968 | consumed samples: 6367232 | consumed tokens: 927973376 | elapsed time per iteration (ms): 111377.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527619E+00 | loss scale: 131072.0 | grad norm: 46857.043 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116240 |
+
time (ms)
|
116241 |
+
iteration 3110/ 292968 | consumed samples: 6369280 | consumed tokens: 928448512 | elapsed time per iteration (ms): 110496.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.555134E+00 | loss scale: 131072.0 | grad norm: 60420.377 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116242 |
+
time (ms)
|
116243 |
+
iteration 3111/ 292968 | consumed samples: 6371328 | consumed tokens: 928923648 | elapsed time per iteration (ms): 111479.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.542067E+00 | loss scale: 131072.0 | grad norm: 47053.293 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116244 |
+
time (ms)
|
116245 |
+
iteration 3112/ 292968 | consumed samples: 6373376 | consumed tokens: 929398784 | elapsed time per iteration (ms): 109535.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.538254E+00 | loss scale: 131072.0 | grad norm: 41897.336 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116246 |
+
time (ms)
|
116247 |
+
iteration 3113/ 292968 | consumed samples: 6375424 | consumed tokens: 929873920 | elapsed time per iteration (ms): 112139.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.540478E+00 | loss scale: 131072.0 | grad norm: 43233.715 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116248 |
+
time (ms)
|
116249 |
+
iteration 3114/ 292968 | consumed samples: 6377472 | consumed tokens: 930349056 | elapsed time per iteration (ms): 109783.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.507294E+00 | loss scale: 131072.0 | grad norm: 38971.265 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116250 |
+
time (ms)
|
116251 |
+
iteration 3115/ 292968 | consumed samples: 6379520 | consumed tokens: 930824192 | elapsed time per iteration (ms): 109544.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527198E+00 | loss scale: 131072.0 | grad norm: 39431.657 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116252 |
+
time (ms)
|
116253 |
+
iteration 3116/ 292968 | consumed samples: 6381568 | consumed tokens: 931299328 | elapsed time per iteration (ms): 109791.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.543795E+00 | loss scale: 131072.0 | grad norm: 35911.906 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116254 |
+
time (ms)
|
116255 |
+
iteration 3117/ 292968 | consumed samples: 6383616 | consumed tokens: 931774464 | elapsed time per iteration (ms): 109068.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.553530E+00 | loss scale: 131072.0 | grad norm: 31794.593 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116256 |
+
time (ms)
|
116257 |
+
iteration 3118/ 292968 | consumed samples: 6385664 | consumed tokens: 932249600 | elapsed time per iteration (ms): 111130.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.521324E+00 | loss scale: 131072.0 | grad norm: 37780.759 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116258 |
+
time (ms)
|
116259 |
+
iteration 3119/ 292968 | consumed samples: 6387712 | consumed tokens: 932724736 | elapsed time per iteration (ms): 110038.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.533055E+00 | loss scale: 131072.0 | grad norm: 36496.675 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116260 |
+
time (ms)
|
116261 |
+
iteration 3120/ 292968 | consumed samples: 6389760 | consumed tokens: 933199872 | elapsed time per iteration (ms): 110677.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.529382E+00 | loss scale: 131072.0 | grad norm: 35531.822 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116262 |
+
time (ms)
|
116263 |
+
iteration 3121/ 292968 | consumed samples: 6391808 | consumed tokens: 933675008 | elapsed time per iteration (ms): 111492.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.548238E+00 | loss scale: 131072.0 | grad norm: 44060.029 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116264 |
+
time (ms)
|
116265 |
+
iteration 3122/ 292968 | consumed samples: 6393856 | consumed tokens: 934150144 | elapsed time per iteration (ms): 110377.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.546454E+00 | loss scale: 131072.0 | grad norm: 50136.311 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116266 |
+
time (ms)
|
116267 |
+
iteration 3123/ 292968 | consumed samples: 6395904 | consumed tokens: 934625280 | elapsed time per iteration (ms): 110624.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.544405E+00 | loss scale: 131072.0 | grad norm: 58389.993 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116268 |
+
time (ms)
|
116269 |
+
iteration 3124/ 292968 | consumed samples: 6397952 | consumed tokens: 935100416 | elapsed time per iteration (ms): 109969.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.526251E+00 | loss scale: 131072.0 | grad norm: 46223.258 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116270 |
+
time (ms)
|
116271 |
+
iteration 3125/ 292968 | consumed samples: 6400000 | consumed tokens: 935575552 | elapsed time per iteration (ms): 110453.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.521111E+00 | loss scale: 131072.0 | grad norm: 47758.541 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116272 |
+
time (ms)
|
116273 |
+
iteration 3126/ 292968 | consumed samples: 6402048 | consumed tokens: 936050688 | elapsed time per iteration (ms): 110497.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.529240E+00 | loss scale: 131072.0 | grad norm: 43012.626 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116274 |
+
time (ms)
|
116275 |
+
iteration 3127/ 292968 | consumed samples: 6404096 | consumed tokens: 936525824 | elapsed time per iteration (ms): 109904.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.557906E+00 | loss scale: 131072.0 | grad norm: 38612.784 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116276 |
+
time (ms)
|
116277 |
+
iteration 3128/ 292968 | consumed samples: 6406144 | consumed tokens: 937000960 | elapsed time per iteration (ms): 108913.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.531578E+00 | loss scale: 131072.0 | grad norm: 41133.483 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116278 |
+
time (ms)
|
116279 |
+
iteration 3129/ 292968 | consumed samples: 6408192 | consumed tokens: 937476096 | elapsed time per iteration (ms): 110361.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.493152E+00 | loss scale: 131072.0 | grad norm: 37207.043 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116280 |
+
time (ms)
|
116281 |
+
iteration 3130/ 292968 | consumed samples: 6410240 | consumed tokens: 937951232 | elapsed time per iteration (ms): 110191.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.501123E+00 | loss scale: 131072.0 | grad norm: 38164.113 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116282 |
+
time (ms)
|
116283 |
+
iteration 3131/ 292968 | consumed samples: 6412288 | consumed tokens: 938426368 | elapsed time per iteration (ms): 111607.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.532626E+00 | loss scale: 131072.0 | grad norm: 33127.285 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116284 |
+
time (ms)
|
116285 |
+
iteration 3132/ 292968 | consumed samples: 6414336 | consumed tokens: 938901504 | elapsed time per iteration (ms): 109170.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.537379E+00 | loss scale: 131072.0 | grad norm: 34759.150 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116286 |
+
time (ms)
|
116287 |
+
iteration 3133/ 292968 | consumed samples: 6416384 | consumed tokens: 939376640 | elapsed time per iteration (ms): 110333.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.516551E+00 | loss scale: 131072.0 | grad norm: 34224.005 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116288 |
+
time (ms)
|
116289 |
+
iteration 3134/ 292968 | consumed samples: 6418432 | consumed tokens: 939851776 | elapsed time per iteration (ms): 111757.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.518162E+00 | loss scale: 131072.0 | grad norm: 36712.897 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116290 |
+
time (ms)
|
116291 |
+
iteration 3135/ 292968 | consumed samples: 6420480 | consumed tokens: 940326912 | elapsed time per iteration (ms): 110838.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.531169E+00 | loss scale: 131072.0 | grad norm: 45871.157 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116292 |
+
time (ms)
|
116293 |
+
iteration 3136/ 292968 | consumed samples: 6422528 | consumed tokens: 940802048 | elapsed time per iteration (ms): 111400.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.530567E+00 | loss scale: 131072.0 | grad norm: 40914.375 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116294 |
+
time (ms)
|
116295 |
+
iteration 3137/ 292968 | consumed samples: 6424576 | consumed tokens: 941277184 | elapsed time per iteration (ms): 110125.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.528954E+00 | loss scale: 131072.0 | grad norm: 46189.970 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116296 |
+
time (ms)
|
116297 |
+
iteration 3138/ 292968 | consumed samples: 6426624 | consumed tokens: 941752320 | elapsed time per iteration (ms): 110514.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.515621E+00 | loss scale: 131072.0 | grad norm: 47886.409 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116298 |
+
time (ms)
|
116299 |
+
iteration 3139/ 292968 | consumed samples: 6428672 | consumed tokens: 942227456 | elapsed time per iteration (ms): 111383.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.544991E+00 | loss scale: 131072.0 | grad norm: 44561.397 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116300 |
+
time (ms)
|
116301 |
+
iteration 3140/ 292968 | consumed samples: 6430720 | consumed tokens: 942702592 | elapsed time per iteration (ms): 109945.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.550672E+00 | loss scale: 131072.0 | grad norm: 55870.403 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116302 |
+
time (ms)
|