Hastagaras commited on
Commit
6baeb5f
1 Parent(s): a03d93b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -9
README.md CHANGED
@@ -8,20 +8,29 @@ tags:
8
  - merge
9
 
10
  ---
11
- # model
12
 
13
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
14
 
15
- ## Merge Details
16
- ### Merge Method
17
 
18
- This model was merged using the SLERP merge method.
19
 
20
- ### Models Merged
21
 
22
- The following models were included in the merge:
23
- * [Hastagaras/anjrit](https://huggingface.co/Hastagaras/anjrit)
24
- * [Hastagaras/anying](https://huggingface.co/Hastagaras/anying)
 
 
 
 
 
 
 
 
 
 
25
 
26
  ### Configuration
27
 
 
8
  - merge
9
 
10
  ---
11
+ # This model is a merge of my two models
12
 
13
+ **Anjrit:** This model is similar to my [Halu Blackroot](https://huggingface.co/Hastagaras/Halu-8B-Llama3-Blackroot) model, but instead of using the standard version, this model uses the OAS version.
14
 
15
+ **Anying:** This model is also similar to the Halu Blackroot, but instead of using the model stock, I merged the Blackroot lora manually with a very low alpha.
 
16
 
17
+ Both models have downsides. The Anjrit model lacks coherency, while the Anying model lacks a human-like quality.
18
 
19
+ I decided to merge both models with the following method:
20
 
21
+ 1. First, I compared the response from each layer of both models using the baukit notebook.
22
+
23
+ 2. After comparing both, it seems that around the bottom layer, the Anjrit model is better, perhaps because it is unhinged.
24
+
25
+ 3. From the bottom to the middle layer, the Anjrit is still better, but the Anying seems smarter.
26
+
27
+ 4. At the middle layer, both seem equal, but again, the Anjrit is unhinged, so I prefer this one.
28
+
29
+ 5. From the middle to the top layer, the Anying is better. It is smarter, and the response is more structured.
30
+
31
+ 6. The top layer of the Anjrit model is better since the model itself is orthogonalized, so I prefer this one.
32
+
33
+ 7. Then I performed slerp with the following configuration. I don't know if this is really how the slerp merge works, so let's just say this is an experimental merge.
34
 
35
  ### Configuration
36