clembench-playpen
/

meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D30003

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

Nicohst commited on Sep 30

Commit

cc66773

•

1 Parent(s): ec848cb

Update README.md

Files changed (1) hide show

README.md +63 -7

README.md CHANGED Viewed

@@ -22,17 +22,73 @@ This model is a fine-tuned version of [unsloth/meta-llama-3.1-8b-instruct-bnb-4b
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters

 ## Model description
+This model was trained on Successful episodes of the top 3 model similar to [D20002](clembench-playpen/meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D20002) but instead of using the whole episode as input,
+each episode was split into conversation pieces.
+e.g.
+```json
+[
+{
+  role: 'user'
+  content: '...'
+},
+{
+  role: 'assistant'
+  content: '...'
+},
+{
+  role: 'user'
+  content: '...'
+},
+{
+  role: 'assistant'
+  content: '...'
+},
+]
+```
+```json
+is split int:
+[
+{
+  role: 'user'
+  content: '...'
+},
+{
+  role: 'assistant'
+  content: '...'
+},
+```
+and
+```json
+[
+{
+  role: 'user'
+  content: '...'
+},
+{
+  role: 'assistant'
+  content: '...'
+},
+{
+  role: 'user'
+  content: '...'
+},
+{
+  role: 'assistant'
+  content: '...'
+},
+]
+```
 ## Training and evaluation data
+After splitting, the dataset contains about 4122 conversation bits accross all games.
+The Dataset ID is D30003
 ### Training hyperparameters