Edit model card

zh_wiki_small

This model is a fine-tuned version of on the wikipedia dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4159

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00015
  • train_batch_size: 32
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 256
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 500000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.4793 0.14 1000 0.4540
0.4701 0.28 2000 0.4467
0.4673 0.42 3000 0.4304
0.4669 0.55 4000 0.4413
0.4668 0.69 5000 0.4368
0.4676 0.83 6000 0.4358
0.4691 0.97 7000 0.4367
0.4693 1.11 8000 0.4429
0.4709 1.25 9000 0.4388
0.4722 1.39 10000 0.4453
0.4729 1.53 11000 0.4415
0.4732 1.67 12000 0.4510
0.4751 1.8 13000 0.4461
0.4765 1.94 14000 0.4448
0.477 2.08 15000 0.4498
0.4779 2.22 16000 0.4447
0.4795 2.36 17000 0.4430
0.481 2.5 18000 0.4499
0.4821 2.64 19000 0.4551
0.4829 2.78 20000 0.4519
0.4838 2.91 21000 0.4520
0.4856 3.05 22000 0.4633
0.4857 3.19 23000 0.4576
0.4869 3.33 24000 0.4485
0.4882 3.47 25000 0.4591
0.4883 3.61 26000 0.4645
0.4889 3.75 27000 0.4570
0.4884 3.89 28000 0.4572
0.4897 4.02 29000 0.4553
0.4883 4.16 30000 0.4534
0.4881 4.3 31000 0.4587
0.4889 4.44 32000 0.4632
0.4886 4.58 33000 0.4587
0.4883 4.72 34000 0.4621
0.4876 4.86 35000 0.4522
0.4878 5.0 36000 0.4560
0.4883 5.13 37000 0.4579
0.4882 5.27 38000 0.4554
0.4883 5.41 39000 0.4588
0.4872 5.55 40000 0.4561
0.4868 5.69 41000 0.4614
0.4875 5.83 42000 0.4584
0.4868 5.97 43000 0.4619
0.4874 6.11 44000 0.4519
0.4874 6.24 45000 0.4625
0.487 6.38 46000 0.4579
0.4872 6.52 47000 0.4534
0.4872 6.66 48000 0.4516
0.4865 6.8 49000 0.4635
0.4865 6.94 50000 0.4610
0.4863 7.08 51000 0.4515
0.4861 7.22 52000 0.4584
0.4866 7.35 53000 0.4541
0.4862 7.49 54000 0.4508
0.4863 7.63 55000 0.4565
0.486 7.77 56000 0.4665
0.486 7.91 57000 0.4565
0.4861 8.05 58000 0.4580
0.4852 8.19 59000 0.4596
0.4846 8.33 60000 0.4527
0.4848 8.46 61000 0.4505
0.4849 8.6 62000 0.4407
0.4851 8.74 63000 0.4579
0.4848 8.88 64000 0.4559
0.4851 9.02 65000 0.4505
0.4846 9.16 66000 0.4615
0.4842 9.3 67000 0.4618
0.484 9.44 68000 0.4559
0.4841 9.57 69000 0.4613
0.484 9.71 70000 0.4527
0.4842 9.85 71000 0.4483
0.4842 9.99 72000 0.4585
0.4837 10.13 73000 0.4585
0.4833 10.27 74000 0.4541
0.4836 10.41 75000 0.4528
0.4832 10.55 76000 0.4475
0.4836 10.68 77000 0.4525
0.4826 10.82 78000 0.4562
0.4824 10.96 79000 0.4502
0.4828 11.1 80000 0.4529
0.4829 11.24 81000 0.4524
0.4823 11.38 82000 0.4506
0.4827 11.52 83000 0.4511
0.4823 11.66 84000 0.4506
0.4827 11.79 85000 0.4561
0.4832 11.93 86000 0.4471
0.482 12.07 87000 0.4479
0.4819 12.21 88000 0.4561
0.4816 12.35 89000 0.4590
0.4818 12.49 90000 0.4469
0.4815 12.63 91000 0.4633
0.4822 12.77 92000 0.4566
0.4816 12.9 93000 0.4548
0.4824 13.04 94000 0.4548
0.4812 13.18 95000 0.4533
0.4809 13.32 96000 0.4546
0.481 13.46 97000 0.4590
0.4807 13.6 98000 0.4465
0.4808 13.74 99000 0.4531
0.4806 13.88 100000 0.4459
0.4809 14.01 101000 0.4517
0.4801 14.15 102000 0.4519
0.4801 14.29 103000 0.4547
0.4805 14.43 104000 0.4517
0.4799 14.57 105000 0.4491
0.4805 14.71 106000 0.4559
0.48 14.85 107000 0.4551
0.4796 14.99 108000 0.4537
0.4801 15.12 109000 0.4509
0.4797 15.26 110000 0.4482
0.4798 15.4 111000 0.4466
0.4789 15.54 112000 0.4445
0.4808 15.68 113000 0.4493
0.4789 15.82 114000 0.4475
0.4792 15.96 115000 0.4543
0.4787 16.1 116000 0.4471
0.4796 16.23 117000 0.4565
0.4787 16.37 118000 0.4515
0.4788 16.51 119000 0.4449
0.4783 16.65 120000 0.4454
0.4787 16.79 121000 0.4486
0.4789 16.93 122000 0.4480
0.4782 17.07 123000 0.4529
0.4782 17.21 124000 0.4481
0.4777 17.34 125000 0.4528
0.4779 17.48 126000 0.4514
0.4781 17.62 127000 0.4520
0.4776 17.76 128000 0.4495
0.4777 17.9 129000 0.4501
0.4783 18.04 130000 0.4528
0.4771 18.18 131000 0.4498
0.4775 18.32 132000 0.4525
0.4772 18.45 133000 0.4482
0.4775 18.59 134000 0.4532
0.4769 18.73 135000 0.4537
0.4776 18.87 136000 0.4509
0.4775 19.01 137000 0.4464
0.4769 19.15 138000 0.4464
0.4772 19.29 139000 0.4499
0.4766 19.43 140000 0.4428
0.4764 19.56 141000 0.4536
0.477 19.7 142000 0.4444
0.4764 19.84 143000 0.4482
0.4764 19.98 144000 0.4510
0.4763 20.12 145000 0.4519
0.4761 20.26 146000 0.4452
0.4761 20.4 147000 0.4476
0.4756 20.54 148000 0.4494
0.4757 20.67 149000 0.4544
0.4762 20.81 150000 0.4412
0.4757 20.95 151000 0.4459
0.4749 21.09 152000 0.4532
0.4752 21.23 153000 0.4477
0.4749 21.37 154000 0.4396
0.4764 21.51 155000 0.4466
0.4753 21.65 156000 0.4523
0.4755 21.78 157000 0.4582
0.4749 21.92 158000 0.4539
0.475 22.06 159000 0.4539
0.4747 22.2 160000 0.4519
0.4745 22.34 161000 0.4370
0.4748 22.48 162000 0.4449
0.4743 22.62 163000 0.4484
0.4745 22.76 164000 0.4471
0.4739 22.89 165000 0.4480
0.4746 23.03 166000 0.4519
0.4739 23.17 167000 0.4478
0.4739 23.31 168000 0.4497
0.4738 23.45 169000 0.4462
0.474 23.59 170000 0.4430
0.4737 23.73 171000 0.4483
0.4737 23.87 172000 0.4508
0.474 24.0 173000 0.4439
0.4729 24.14 174000 0.4426
0.4735 24.28 175000 0.4433
0.4722 24.42 176000 0.4483
0.4728 24.56 177000 0.4496
0.4727 24.7 178000 0.4473
0.4729 24.84 179000 0.4404
0.4722 24.98 180000 0.4426
0.4724 25.11 181000 0.4479
0.4739 25.25 182000 0.4430
0.4723 25.39 183000 0.4418
0.4724 25.53 184000 0.4371
0.472 25.67 185000 0.4456
0.4726 25.81 186000 0.4419
0.4721 25.95 187000 0.4417
0.4722 26.09 188000 0.4475
0.4715 26.22 189000 0.4389
0.4717 26.36 190000 0.4451
0.4716 26.5 191000 0.4440
0.4714 26.64 192000 0.4399
0.4712 26.78 193000 0.4398
0.4709 26.92 194000 0.4424
0.4714 27.06 195000 0.4533
0.4706 27.2 196000 0.4394
0.471 27.33 197000 0.4436
0.4707 27.47 198000 0.4421
0.471 27.61 199000 0.4459
0.4707 27.75 200000 0.4439
0.471 27.89 201000 0.4467
0.471 28.03 202000 0.4439
0.4704 28.17 203000 0.4445
0.4705 28.31 204000 0.4429
0.4706 28.44 205000 0.4382
0.4703 28.58 206000 0.4425
0.4695 28.72 207000 0.4414
0.4696 28.86 208000 0.4405
0.4696 29.0 209000 0.4460
0.4701 29.14 210000 0.4460
0.4696 29.28 211000 0.4397
0.4693 29.42 212000 0.4439
0.4694 29.55 213000 0.4495
0.469 29.69 214000 0.4466
0.4691 29.83 215000 0.4336
0.4694 29.97 216000 0.4377
0.4698 30.11 217000 0.4356
0.4689 30.25 218000 0.4381
0.4685 30.39 219000 0.4431
0.4688 30.53 220000 0.4411
0.4687 30.66 221000 0.4445
0.4685 30.8 222000 0.4432
0.4687 30.94 223000 0.4383
0.4681 31.08 224000 0.4371
0.4683 31.22 225000 0.4384
0.4678 31.36 226000 0.4396
0.4682 31.5 227000 0.4387
0.4671 31.64 228000 0.4382
0.4676 31.77 229000 0.4410
0.4681 31.91 230000 0.4391
0.4676 32.05 231000 0.4429
0.4673 32.19 232000 0.4395
0.4669 32.33 233000 0.4389
0.4675 32.47 234000 0.4452
0.4667 32.61 235000 0.4395
0.4667 32.75 236000 0.4460
0.4672 32.88 237000 0.4404
0.4667 33.02 238000 0.4372
0.4663 33.16 239000 0.4362
0.4669 33.3 240000 0.4428
0.4662 33.44 241000 0.4370
0.4662 33.58 242000 0.4382
0.466 33.72 243000 0.4395
0.4661 33.86 244000 0.4418
0.4663 33.99 245000 0.4407
0.4661 34.13 246000 0.4346
0.4652 34.27 247000 0.4392
0.4662 34.41 248000 0.4396
0.4655 34.55 249000 0.4427
0.4657 34.69 250000 0.4484
0.4654 34.83 251000 0.4268
0.4655 34.97 252000 0.4384
0.4649 35.1 253000 0.4383
0.465 35.24 254000 0.4368
0.4648 35.38 255000 0.4327
0.4647 35.52 256000 0.4416
0.4652 35.66 257000 0.4390
0.4646 35.8 258000 0.4450
0.4651 35.94 259000 0.4354
0.4643 36.08 260000 0.4473
0.464 36.21 261000 0.4423
0.4638 36.35 262000 0.4339
0.464 36.49 263000 0.4438
0.464 36.63 264000 0.4398
0.4637 36.77 265000 0.4352
0.4641 36.91 266000 0.4352
0.4651 37.05 267000 0.4324
0.4637 37.19 268000 0.4341
0.4633 37.32 269000 0.4331
0.4639 37.46 270000 0.4391
0.463 37.6 271000 0.4380
0.4635 37.74 272000 0.4355
0.4631 37.88 273000 0.4397
0.464 38.02 274000 0.4336
0.4629 38.16 275000 0.4339
0.4634 38.3 276000 0.4355
0.4632 38.43 277000 0.4388
0.4628 38.57 278000 0.4341
0.4621 38.71 279000 0.4337
0.4626 38.85 280000 0.4340
0.462 38.99 281000 0.4306
0.8286 39.13 282000 0.4504
0.4624 39.27 283000 0.4399
0.4621 39.41 284000 0.4351
0.4622 39.54 285000 0.4304
0.4619 39.68 286000 0.4329
0.4618 39.82 287000 0.4208
0.462 39.96 288000 0.4414
0.4615 40.1 289000 0.4353
0.4614 40.24 290000 0.4398
0.4611 40.38 291000 0.4371
0.4608 40.52 292000 0.4326
0.4611 40.65 293000 0.4332
0.4614 40.79 294000 0.4343
0.4609 40.93 295000 0.4306
0.4608 41.07 296000 0.4323
0.4608 41.21 297000 0.4321
0.4601 41.35 298000 0.4330
0.4606 41.49 299000 0.4361
0.4606 41.63 300000 0.4367
0.46 41.76 301000 0.4327
0.4596 41.9 302000 0.4306
0.46 42.04 303000 0.4352
0.46 42.18 304000 0.4338
0.4597 42.32 305000 0.4333
0.4596 42.46 306000 0.4334
0.4591 42.6 307000 0.4334
0.4597 42.74 308000 0.4319
0.4586 42.87 309000 0.4268
0.4593 43.01 310000 0.4366
0.4591 43.15 311000 0.4283
0.4587 43.29 312000 0.4289
0.4594 43.43 313000 0.4332
0.459 43.57 314000 0.4326
0.4586 43.71 315000 0.4356
0.4581 43.85 316000 0.4271
0.4584 43.98 317000 0.4325
0.4586 44.12 318000 0.4350
0.4584 44.26 319000 0.4273
0.4576 44.4 320000 0.4284
0.458 44.54 321000 0.4331
0.4581 44.68 322000 0.4263
0.4579 44.82 323000 0.4283
0.4583 44.96 324000 0.4362
0.4571 45.1 325000 0.4330
0.4566 45.23 326000 0.4300
0.4572 45.37 327000 0.4258
0.4574 45.51 328000 0.4200
0.4573 45.65 329000 0.4299
0.4578 45.79 330000 0.4319
0.4576 45.93 331000 0.4352
0.4574 46.07 332000 0.4278
0.4572 46.21 333000 0.4326
0.4568 46.34 334000 0.4295
0.4569 46.48 335000 0.4300
0.4566 46.62 336000 0.4333
0.4567 46.76 337000 0.4262
0.4564 46.9 338000 0.4354
0.4574 47.04 339000 0.4357
0.4564 47.18 340000 0.4308
0.4554 47.32 341000 0.4350
0.456 47.45 342000 0.4400
0.456 47.59 343000 0.4237
0.4559 47.73 344000 0.4236
0.4559 47.87 345000 0.4305
0.4559 48.01 346000 0.4245
0.4549 48.15 347000 0.4182
0.4556 48.29 348000 0.4330
0.4551 48.43 349000 0.4397
0.455 48.56 350000 0.4252
0.4548 48.7 351000 0.4246
0.4551 48.84 352000 0.4291
0.4554 48.98 353000 0.4286
0.4547 49.12 354000 0.4336
0.4548 49.26 355000 0.4324
0.4545 49.4 356000 0.4236
0.4547 49.54 357000 0.4345
0.4542 49.67 358000 0.4329
0.4545 49.81 359000 0.4241
0.4541 49.95 360000 0.4177
0.454 50.09 361000 0.4244
0.4538 50.23 362000 0.4190
0.4535 50.37 363000 0.4331
0.4545 50.51 364000 0.4252
0.454 50.65 365000 0.4315
0.4536 50.78 366000 0.4301
0.4534 50.92 367000 0.4357
0.4537 51.06 368000 0.4334
0.4535 51.2 369000 0.4200
0.4538 51.34 370000 0.4274
0.4536 51.48 371000 0.4178
0.4534 51.62 372000 0.4181
0.4533 51.76 373000 0.4211
0.4535 51.89 374000 0.4290
0.4535 52.03 375000 0.4201
0.4526 52.17 376000 0.4263
0.4526 52.31 377000 0.4237
0.4524 52.45 378000 0.4254
0.4529 52.59 379000 0.4260
0.4531 52.73 380000 0.4202
0.4523 52.87 381000 0.4223
0.4523 53.0 382000 0.4271
0.4522 53.14 383000 0.4286
0.4524 53.28 384000 0.4256
0.4515 53.42 385000 0.4221
0.4513 53.56 386000 0.4255
0.452 53.7 387000 0.4270
0.4519 53.84 388000 0.4222
0.4518 53.98 389000 0.4233
0.4513 54.11 390000 0.4233
0.4517 54.25 391000 0.4239
0.4518 54.39 392000 0.4273
0.4508 54.53 393000 0.4200
0.4511 54.67 394000 0.4236
0.4508 54.81 395000 0.4193
0.4507 54.95 396000 0.4293
0.4508 55.09 397000 0.4187
0.4504 55.22 398000 0.4283
0.4512 55.36 399000 0.4239
0.4504 55.5 400000 0.4269
0.4506 55.64 401000 0.4291
0.4504 55.78 402000 0.4238
0.4503 55.92 403000 0.4200
0.4506 56.06 404000 0.4186
0.4507 56.2 405000 0.4260
0.4504 56.33 406000 0.4188
0.4503 56.47 407000 0.4231
0.4498 56.61 408000 0.4148
0.4499 56.75 409000 0.4182
0.4498 56.89 410000 0.4229
0.4501 57.03 411000 0.4252
0.4497 57.17 412000 0.4220
0.45 57.31 413000 0.4181
0.4497 57.44 414000 0.4270
0.4497 57.58 415000 0.4208
0.4499 57.72 416000 0.4224
0.4496 57.86 417000 0.4207
0.4494 58.0 418000 0.4268
0.4499 58.14 419000 0.4240
0.4495 58.28 420000 0.4294
0.4487 58.42 421000 0.4207
0.4495 58.55 422000 0.4246
0.4491 58.69 423000 0.4213
0.4492 58.83 424000 0.4241
0.4486 58.97 425000 0.4247
0.4485 59.11 426000 0.4163
0.4489 59.25 427000 0.4239
0.4483 59.39 428000 0.4240
0.4491 59.53 429000 0.4214
0.4485 59.66 430000 0.4285
0.449 59.8 431000 0.4265
0.4484 59.94 432000 0.4188
0.4484 60.08 433000 0.4176
0.4488 60.22 434000 0.4200
0.448 60.36 435000 0.4116
0.4477 60.5 436000 0.4215
0.4484 60.64 437000 0.4204
0.448 60.77 438000 0.4093
0.4479 60.91 439000 0.4181
0.4481 61.05 440000 0.4232
0.4477 61.19 441000 0.4202
0.4478 61.33 442000 0.4167
0.4481 61.47 443000 0.4173
0.4483 61.61 444000 0.4158
0.4473 61.75 445000 0.4174
0.4474 61.88 446000 0.4266
0.4477 62.02 447000 0.4242
0.4476 62.16 448000 0.4240
0.4478 62.3 449000 0.4286
0.4474 62.44 450000 0.4294
0.4482 62.58 451000 0.4144
0.4471 62.72 452000 0.4316
0.448 62.86 453000 0.4228
0.4474 62.99 454000 0.4242
0.447 63.13 455000 0.4231
0.4475 63.27 456000 0.4235
0.4475 63.41 457000 0.4279
0.4476 63.55 458000 0.4230
0.4464 63.69 459000 0.4145
0.4467 63.83 460000 0.4230
0.4465 63.97 461000 0.4208
0.4466 64.1 462000 0.4243
0.447 64.24 463000 0.4220
0.4473 64.38 464000 0.4253
0.4471 64.52 465000 0.4194
0.447 64.66 466000 0.4262
0.447 64.8 467000 0.4245
0.4468 64.94 468000 0.4143
0.4463 65.08 469000 0.4187
0.4465 65.21 470000 0.4185
0.4465 65.35 471000 0.4244
0.4467 65.49 472000 0.4201
0.4465 65.63 473000 0.4160
0.4467 65.77 474000 0.4273
0.4465 65.91 475000 0.4183
0.4467 66.05 476000 0.4227
0.4469 66.19 477000 0.4166
0.4467 66.32 478000 0.4199
0.4464 66.46 479000 0.4181
0.4463 66.6 480000 0.4217
0.4464 66.74 481000 0.4158
0.4468 66.88 482000 0.4191
0.447 67.02 483000 0.4248
0.4465 67.16 484000 0.4234
0.4463 67.3 485000 0.4238
0.446 67.43 486000 0.4162
0.4462 67.57 487000 0.4202
0.4462 67.71 488000 0.4177
0.4455 67.85 489000 0.4228
0.4463 67.99 490000 0.4146
0.4454 68.13 491000 0.4190
0.446 68.27 492000 0.4219
0.4461 68.41 493000 0.4250
0.4462 68.54 494000 0.4172
0.4464 68.68 495000 0.4122
0.4459 68.82 496000 0.4178
0.4459 68.96 497000 0.4095
0.4458 69.1 498000 0.4124
0.4458 69.24 499000 0.4182
0.4458 69.38 500000 0.4177

Framework versions

  • Transformers 4.17.0
  • Pytorch 1.12.0
  • Datasets 2.0.0
  • Tokenizers 0.13.2
Downloads last month
15
Inference API
Unable to determine this model’s pipeline type. Check the docs .