sinhala_albert

This model is a fine-tuned version of albert-base-v2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 4.5337

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 128
eval_batch_size: 128
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
1.0056	1.0	83	1.0130
0.9992	2.0	166	1.0021
0.9774	3.0	249	0.9789
0.9323	4.0	332	0.9695
0.863	5.0	415	0.9616
0.7944	6.0	498	0.9871
0.6328	7.0	581	1.0075
0.4705	8.0	664	1.4911
0.2834	9.0	747	1.5777
0.2278	10.0	830	1.6406
0.1751	11.0	913	1.7568
0.1657	12.0	996	1.7089
0.0974	13.0	1079	1.8463
0.1562	14.0	1162	1.9219
0.118	15.0	1245	1.9384
0.1044	16.0	1328	1.9971
0.0952	17.0	1411	2.1732
0.0877	18.0	1494	2.0566
0.0598	19.0	1577	2.4616
0.0762	20.0	1660	2.2672
0.1003	21.0	1743	2.3471
0.0627	22.0	1826	2.2526
0.0584	23.0	1909	2.7092
0.0679	24.0	1992	2.1629
0.0538	25.0	2075	2.5745
0.0723	26.0	2158	2.5667
0.0564	27.0	2241	2.4331
0.0662	28.0	2324	2.7811
0.0226	29.0	2407	2.8163
0.0224	30.0	2490	2.7452
0.0344	31.0	2573	2.6642
0.0519	32.0	2656	2.3490
0.0478	33.0	2739	2.7382
0.0436	34.0	2822	2.7556
0.0474	35.0	2905	2.5449
0.0355	36.0	2988	2.8280
0.0343	37.0	3071	2.9405
0.0283	38.0	3154	2.9870
0.0446	39.0	3237	3.0252
0.0288	40.0	3320	3.0866
0.0134	41.0	3403	3.1549
0.0328	42.0	3486	3.0168
0.0159	43.0	3569	2.8753
0.0155	44.0	3652	3.3455
0.0087	45.0	3735	3.4373
0.0296	46.0	3818	3.1949
0.0085	47.0	3901	3.1817
0.0048	48.0	3984	3.2233
0.0122	49.0	4067	3.5465
0.0024	50.0	4150	3.5276
0.0014	51.0	4233	3.5111
0.0121	52.0	4316	3.4483
0.0159	53.0	4399	3.8072
0.0027	54.0	4482	3.7448
0.0059	55.0	4565	3.9230
0.0083	56.0	4648	3.9245
0.0128	57.0	4731	3.8699
0.0116	58.0	4814	3.9957
0.0013	59.0	4897	3.8153
0.0013	60.0	4980	3.9093
0.0035	61.0	5063	4.0339
0.0028	62.0	5146	3.9929
0.0036	63.0	5229	4.1217
0.004	64.0	5312	4.0936
0.0014	65.0	5395	4.1109
0.0047	66.0	5478	4.1978
0.0005	67.0	5561	4.2320
0.0009	68.0	5644	4.2441
0.0027	69.0	5727	4.2670
0.0008	70.0	5810	4.2923
0.0013	71.0	5893	4.3101
0.0006	72.0	5976	4.3561
0.0024	73.0	6059	4.3419
0.0014	74.0	6142	4.3432
0.0011	75.0	6225	4.3302
0.0	76.0	6308	4.3439
0.0016	77.0	6391	4.3667
0.0026	78.0	6474	4.3803
0.0031	79.0	6557	4.3800
0.002	80.0	6640	4.3941
0.0008	81.0	6723	4.4071
0.0019	82.0	6806	4.4259
0.0013	83.0	6889	4.4436
0.0015	84.0	6972	4.4603
0.0009	85.0	7055	4.4706
0.0019	86.0	7138	4.4701
0.001	87.0	7221	4.4777
0.0007	88.0	7304	4.4905
0.0021	89.0	7387	4.4910
0.0012	90.0	7470	4.4959
0.0	91.0	7553	4.4990
0.0024	92.0	7636	4.5091
0.0031	93.0	7719	4.5115
0.0011	94.0	7802	4.5215
0.0	95.0	7885	4.5152
0.002	96.0	7968	4.5200
0.0013	97.0	8051	4.5293
0.0013	98.0	8134	4.5285
0.0023	99.0	8217	4.5339
0.002	100.0	8300	4.5337

Framework versions

Transformers 4.41.0.dev0
Pytorch 2.2.1+cu118
Datasets 2.14.5
Tokenizers 0.19.1

theekshana
/

sinhala_albert

sinhala_albert

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for theekshana/sinhala_albert

Collection including theekshana/sinhala_albert

Sinhala LLM

Evaluation results