---
license: other
base_model: deepseek-ai/deepseek-llm-7b-chat
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- self-generate/ds_chat_original_cn_mining_oj_iter0-binarized
- self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized
- self-generate/ds_chat_original_cn_rl_oj_iter0-binarized
model-index:
- name: ds_chat_sigmoid_iter0_2024-09-14-21.15
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://ml.byteintl.net/experiment/tracking/detail?Id=project_20240915_20321b8f&selectedTrial=run_20240915_d060d7a7)
# ds_chat_sigmoid_iter0_2024-09-14-21.15

This model is a fine-tuned version of [deepseek-ai/deepseek-llm-7b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat) on the self-generate/ds_chat_original_cn_mining_oj_iter0-binarized, the self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized and the self-generate/ds_chat_original_cn_rl_oj_iter0-binarized datasets.
It achieves the following results on the evaluation set:
- Loss: 0.7009
- Rewards/chosen: 0.3500
- Rewards/rejected: 0.0298
- Rewards/accuracies: 0.3289
- Rewards/margins: 0.3202
- Logps/rejected: -63.8274
- Logps/chosen: -122.4480
- Logits/rejected: 1.6952
- Logits/chosen: 1.6350
- Debug/policy Chosen Logits: 1.6350
- Debug/policy Rejected Logits: 1.6952
- Debug/policy Chosen Logps: -122.4480
- Debug/policy Rejected Logps: -63.8274
- Debug/reference Chosen Logps: -123.1481
- Debug/reference Rejected Logps: -63.8871

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- lr_scheduler_warmup_steps: 100
- num_epochs: 8.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Debug/policy Chosen Logits | Debug/policy Rejected Logits | Debug/policy Chosen Logps | Debug/policy Rejected Logps | Debug/reference Chosen Logps | Debug/reference Rejected Logps |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------------------------:|:----------------------------:|:-------------------------:|:---------------------------:|:----------------------------:|:------------------------------:|
| 0.6965        | 0.3623 | 100  | 0.6848          | 0.1614         | 0.0731           | 0.2895             | 0.0882          | -63.7408       | -122.8253    | 1.7215          | 1.6604        | 1.6604                     | 1.7215                       | -122.8253                 | -63.7408                    | -123.1481                    | -63.8871                       |
| 0.7398        | 0.7246 | 200  | 0.7128          | 0.4980         | 0.1123           | 0.3289             | 0.3857          | -63.6625       | -122.1521    | 1.7105          | 1.6513        | 1.6513                     | 1.7105                       | -122.1521                 | -63.6625                    | -123.1481                    | -63.8871                       |
| 0.7007        | 1.0870 | 300  | 0.6869          | 0.4063         | -0.0006          | 0.3158             | 0.4070          | -63.8883       | -122.3354    | 1.7138          | 1.6542        | 1.6542                     | 1.7138                       | -122.3354                 | -63.8883                    | -123.1481                    | -63.8871                       |
| 0.7084        | 1.4493 | 400  | 0.7388          | 0.4329         | 0.1275           | 0.3026             | 0.3054          | -63.6320       | -122.2823    | 1.7009          | 1.6406        | 1.6406                     | 1.7009                       | -122.2823                 | -63.6320                    | -123.1481                    | -63.8871                       |
| 0.693         | 1.8116 | 500  | 0.6927          | 0.1909         | -0.0563          | 0.3158             | 0.2472          | -63.9997       | -122.7663    | 1.7035          | 1.6431        | 1.6431                     | 1.7035                       | -122.7663                 | -63.9997                    | -123.1481                    | -63.8871                       |
| 0.6683        | 2.1739 | 600  | 0.6755          | 0.2946         | 0.0203           | 0.3421             | 0.2744          | -63.8465       | -122.5588    | 1.7045          | 1.6442        | 1.6442                     | 1.7045                       | -122.5588                 | -63.8465                    | -123.1481                    | -63.8871                       |
| 0.7035        | 2.5362 | 700  | 0.6899          | 0.1404         | -0.0287          | 0.3158             | 0.1691          | -63.9445       | -122.8673    | 1.7058          | 1.6448        | 1.6448                     | 1.7058                       | -122.8673                 | -63.9445                    | -123.1481                    | -63.8871                       |
| 0.685         | 2.8986 | 800  | 0.6978          | 0.4321         | 0.0759           | 0.3947             | 0.3562          | -63.7352       | -122.2839    | 1.7109          | 1.6500        | 1.6500                     | 1.7109                       | -122.2839                 | -63.7352                    | -123.1481                    | -63.8871                       |
| 0.6585        | 3.2609 | 900  | 0.7158          | 0.4197         | 0.1341           | 0.2763             | 0.2856          | -63.6189       | -122.3087    | 1.7148          | 1.6527        | 1.6527                     | 1.7148                       | -122.3087                 | -63.6189                    | -123.1481                    | -63.8871                       |
| 0.6654        | 3.6232 | 1000 | 0.6837          | 0.4128         | 0.0010           | 0.3947             | 0.4118          | -63.8851       | -122.3225    | 1.7064          | 1.6460        | 1.6460                     | 1.7064                       | -122.3225                 | -63.8851                    | -123.1481                    | -63.8871                       |
| 0.669         | 3.9855 | 1100 | 0.6801          | 0.2662         | -0.0151          | 0.3816             | 0.2813          | -63.9173       | -122.6156    | 1.7008          | 1.6413        | 1.6413                     | 1.7008                       | -122.6156                 | -63.9173                    | -123.1481                    | -63.8871                       |
| 0.6658        | 4.3478 | 1200 | 0.6950          | 0.2165         | -0.0405          | 0.3553             | 0.2570          | -63.9680       | -122.7150    | 1.6985          | 1.6382        | 1.6382                     | 1.6985                       | -122.7150                 | -63.9680                    | -123.1481                    | -63.8871                       |
| 0.6774        | 4.7101 | 1300 | 0.6833          | 0.3216         | 0.0373           | 0.3289             | 0.2843          | -63.8124       | -122.5048    | 1.6956          | 1.6371        | 1.6371                     | 1.6956                       | -122.5048                 | -63.8124                    | -123.1481                    | -63.8871                       |
| 0.6553        | 5.0725 | 1400 | 0.6871          | 0.4489         | 0.0096           | 0.3421             | 0.4393          | -63.8679       | -122.2503    | 1.6926          | 1.6324        | 1.6324                     | 1.6926                       | -122.2503                 | -63.8679                    | -123.1481                    | -63.8871                       |
| 0.655         | 5.4348 | 1500 | 0.6900          | 0.3867         | 0.0004           | 0.3553             | 0.3863          | -63.8863       | -122.3746    | 1.7037          | 1.6446        | 1.6446                     | 1.7037                       | -122.3746                 | -63.8863                    | -123.1481                    | -63.8871                       |
| 0.6552        | 5.7971 | 1600 | 0.6981          | 0.2816         | -0.0683          | 0.3158             | 0.3498          | -64.0236       | -122.5849    | 1.6935          | 1.6342        | 1.6342                     | 1.6935                       | -122.5849                 | -64.0236                    | -123.1481                    | -63.8871                       |
| 0.6471        | 6.1594 | 1700 | 0.7017          | 0.3683         | 0.0204           | 0.3553             | 0.3479          | -63.8463       | -122.4115    | 1.6992          | 1.6385        | 1.6385                     | 1.6992                       | -122.4115                 | -63.8463                    | -123.1481                    | -63.8871                       |
| 0.6557        | 6.5217 | 1800 | 0.6957          | 0.2688         | -0.0975          | 0.3026             | 0.3663          | -64.0820       | -122.6105    | 1.6947          | 1.6337        | 1.6337                     | 1.6947                       | -122.6105                 | -64.0820                    | -123.1481                    | -63.8871                       |
| 0.6516        | 6.8841 | 1900 | 0.6872          | 0.3905         | 0.0084           | 0.3553             | 0.3821          | -63.8704       | -122.3671    | 1.7002          | 1.6400        | 1.6400                     | 1.7002                       | -122.3671                 | -63.8704                    | -123.1481                    | -63.8871                       |
| 0.6542        | 7.2464 | 2000 | 0.6910          | 0.3410         | 0.0003           | 0.3289             | 0.3406          | -63.8864       | -122.4661    | 1.6915          | 1.6320        | 1.6320                     | 1.6915                       | -122.4661                 | -63.8864                    | -123.1481                    | -63.8871                       |
| 0.6629        | 7.6087 | 2100 | 0.6930          | 0.4245         | 0.0306           | 0.3026             | 0.3939          | -63.8259       | -122.2991    | 1.6968          | 1.6376        | 1.6376                     | 1.6968                       | -122.2991                 | -63.8259                    | -123.1481                    | -63.8871                       |
| 0.6427        | 7.9710 | 2200 | 0.7009          | 0.3500         | 0.0298           | 0.3289             | 0.3202          | -63.8274       | -122.4480    | 1.6952          | 1.6350        | 1.6350                     | 1.6952                       | -122.4480                 | -63.8274                    | -123.1481                    | -63.8871                       |


### Framework versions

- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1