Edit model card

BitNet-based-Llama2-jp-test

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 92.3872

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 156
  • eval_batch_size: 156
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
92.3586 0.06 100 92.3876
92.3629 0.12 200 92.3877
92.3395 0.18 300 92.3753
92.3229 0.24 400 92.3346
92.3158 0.3 500 92.3378
92.3411 0.36 600 92.3068
92.3362 0.42 700 92.3086
92.3304 0.48 800 92.3751
92.3344 0.55 900 92.3510
92.3355 0.61 1000 92.3283
92.3628 0.67 1100 92.3356
92.337 0.73 1200 92.3693
92.3825 0.79 1300 92.3734
92.3569 0.85 1400 92.2878
92.3633 0.91 1500 92.3738
92.3392 0.97 1600 92.3872

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
11
Safetensors
Model size
122M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for HachiML/Bit-Llama2-jp-122M-test-1

Finetuned
(153)
this model