ptrdvn commited on
Commit
dc49658
1 Parent(s): 698c992

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -1,3 +1,18 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+
6
+ # Base checkpoint
7
+ augmxnt/shisa-7b-v1
8
+ * Mistral-7B base
9
+ * Pre-trained on 8B of MADLAD-Ja
10
+ * Finetuned on Japanese instructions
11
+ * Highest scoring 7B model on conversation benchmark (JA MT-Bench)
12
+
13
+ # Training datasets (total ~7B)
14
+ * Aozora Bunko
15
+ * Japanese Law Precedent Dataset
16
+ * Japanese Wikipedia
17
+ * .lg.jp, .go.jp, .ac.jp domain webscrapes from CulturaX (Any documents with same first 25 characters were de-duplicated)
18
+ * English Ultrachat200K-gen (So that it doesn't forget English and chatting ability learned in the base checkpoint)