aashish1904 commited on
Commit
87553b5
1 Parent(s): ccc2a35

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ library_name: transformers
5
+ license: apache-2.0
6
+ base_model: Heralax/philosophy-llm-mistral-pretrain
7
+ tags:
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: philosophy-hardcore-pretraining
11
+ results: []
12
+
13
+ ---
14
+
15
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
16
+
17
+
18
+ # QuantFactory/philosophy-mistral-GGUF
19
+ This is quantized version of [Heralax/philosophy-mistral](https://huggingface.co/Heralax/philosophy-mistral) created using llama.cpp
20
+
21
+ # Original Model Card
22
+
23
+
24
+ # Philosophy LLM
25
+
26
+ I would've trained this on Phi so I could've called it Phi-losophy if I had thought of that joke before kicking off the run. Oh well.
27
+ It's trained on Mistral instead. That's a Mist opportunity right there.
28
+
29
+ This is a narrow domain-expert LLM trained on the top 5 books on Gutenberg:
30
+
31
+ - The Problems of Philosophy (Bertrand Russell)
32
+ - Beyond Good and Evil (Nietzsche)
33
+ - Thus Spake Zarathustra: A Book for All and None (Nietzsche)
34
+ - The Prince (Machiavelli)
35
+ - Second Treatise of Government
36
+
37
+ It's meant to be an interesting novelty, showing off training on a specific domain. It has some quirks. Namely:
38
+
39
+ 1. It seems to have memorized the training data very well. Ask a question that exists in the training data, with temp 0, and it will usually give you back the exact response word-for-word. This means that, on the subjects covered by its data, it will be very knowledgeable.
40
+ 2. I forgot to include any generalist instruct data, so it's... not stupid, at least not particularly stupid by 7b standards, but it is very much limited to QA.
41
+ 3. It's much less fluffy and wasteful with its responses than previous Augmentoolkit domain expert models, due to using a new dataset setting. This tends to make it respond with less detail, but it also may remember stuff better and get to the point easier.
42
+
43
+ Some example chats (blame LM studio for not hiding the stop token):
44
+
45
+ Asking stuff from the training data:
46
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64825ebceb4befee377cf8ac/AirHFo61iB1HAP-IXwnZn.png)
47
+
48
+ Asking a question directly from the training data and one I came up with on the spot.
49
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64825ebceb4befee377cf8ac/Ccm-EeDyOFcylCefwDS-W.png)
50
+
51
+ Some things that are kinda funny but also show off the drawback of not using any generalist data:
52
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64825ebceb4befee377cf8ac/e2sCBLIX8Xg91KSevGt_B.png))
53
+
54
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64825ebceb4befee377cf8ac/P0bhWyENxOaxPvC4fE6jw.png)
55
+
56
+ ## Training procedure
57
+
58
+ ### Training hyperparameters
59
+
60
+ The following hyperparameters were used during training:
61
+ - learning_rate: 2e-05
62
+ - train_batch_size: 2
63
+ - eval_batch_size: 1
64
+ - seed: 42
65
+ - distributed_type: multi-GPU
66
+ - num_devices: 6
67
+ - gradient_accumulation_steps: 6
68
+ - total_train_batch_size: 72
69
+ - total_eval_batch_size: 6
70
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
71
+ - lr_scheduler_type: cosine
72
+ - lr_scheduler_warmup_steps: 136
73
+ - num_epochs: 6
74
+
75
+ ### Framework versions
76
+
77
+ - Transformers 4.45.0.dev0
78
+ - Pytorch 2.3.1+cu121
79
+ - Datasets 2.21.0
80
+ - Tokenizers 0.19.1
81
+