Hastagaras
/

Anjir-8B-L3

Text Generation

Not-For-All-Audiences

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Hastagaras commited on May 30

Commit

6baeb5f

•

1 Parent(s): a03d93b

Update README.md

Files changed (1) hide show

README.md +18 -9

README.md CHANGED Viewed

@@ -8,20 +8,29 @@ tags:
 - merge
 ---
-# model
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the SLERP merge method.
-### Models Merged
-The following models were included in the merge:
-* [Hastagaras/anjrit](https://huggingface.co/Hastagaras/anjrit)
-* [Hastagaras/anying](https://huggingface.co/Hastagaras/anying)
 ### Configuration

 - merge
 ---
+# This model is a merge of my two models
+**Anjrit:** This model is similar to my [Halu Blackroot](https://huggingface.co/Hastagaras/Halu-8B-Llama3-Blackroot) model, but instead of using the standard version, this model uses the OAS version.
+**Anying:** This model is also similar to the Halu Blackroot, but instead of using the model stock, I merged the Blackroot lora manually with a very low alpha.
+Both models have downsides. The Anjrit model lacks coherency, while the Anying model lacks a human-like quality.
+I decided to merge both models with the following method:
+1. First, I compared the response from each layer of both models using the baukit notebook.
+2. After comparing both, it seems that around the bottom layer, the Anjrit model is better, perhaps because it is unhinged.
+3. From the bottom to the middle layer, the Anjrit is still better, but the Anying seems smarter.
+4. At the middle layer, both seem equal, but again, the Anjrit is unhinged, so I prefer this one.
+5. From the middle to the top layer, the Anying is better. It is smarter, and the response is more structured.
+6. The top layer of the Anjrit model is better since the model itself is orthogonalized, so I prefer this one.
+7. Then I performed slerp with the following configuration. I don't know if this is really how the slerp merge works, so let's just say this is an experimental merge.
 ### Configuration