Cathallama-70B / README.md

Update README.md

262c672 verified 3 months ago

5.37 kB

	---
	license: llama3.1
	language:
	- en
	library_name: transformers
	tags:
	- mergekit
	- merge
	base_model:
	- meta-llama/Meta-Llama-3.1-70B-Instruct
	- turboderp/Cat-Llama-3-70B-instruct
	- Nexusflow/Athene-70B
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/KxaiZ7rDKkYlix99O9j5H.png)

	Cathallama
	=====================================

	Awesome model, my new daily driver.

	Edit: I am seeing a lot of token generations pointing to unknown unicode addresses that didn't show up during testing for this model, so I have stopped using it and I am working on a new version.

	Notable Performance

	* 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0
	* Strong performance in MMLU-PRO categories overall
	* Great performance during manual testing

	Creation workflow
	=====================
	Models merged
	* meta-llama/Meta-Llama-3.1-70B-Instruct
	* turboderp/Cat-Llama-3-70B-instruct
	* Nexusflow/Athene-70B

	```
	flowchart TD
	A[Nexusflow_Athene] -->\|Merge with\| B[Meta-Llama-3.1]
	C[turboderp_Cat] -->\|Merge with\| D[Meta-Llama-3.1]
	B -->\| \| E[Merge]
	D -->\| \| E[Merge]
	E[Merge] -->\|Result\| F[Cathallama]
	```


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/bBcB194tAtsZjPUnI1pDQ.png)

	Testing
	=====================

	Hyperparameters
	---------------

	* Temperature: 0.0 for automated, 0.9 for manual
	* Penalize repeat sequence: 1.05
	* Consider N tokens for penalize: 256
	* Penalize repetition of newlines
	* Top-K sampling: 40
	* Top-P sampling: 0.95
	* Min-P sampling: 0.05

	LLaMAcpp Version
	------------------

	* b3527-2-g2d5dd7bb
	* -fa -ngl -1 -ctk f16 --no-mmap

	Tested Files
	------------------

	* Cathallama-70B.Q4_0.gguf
	* Nexusflow_Athene-70B.Q4_0.gguf
	* turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
	* Meta-Llama-3.1-70B-Instruct.Q4_0.gguf

	Tests
	--------------


	Manual testing

	\| Category \| Test Case \| Cathallama-70B.Q4_0.gguf \| Nexusflow_Athene-70B.Q4_0.gguf \| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf \| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| Common Sense \| Ball on cup \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| OK \|
	\| \| Big duck small horse \| <span style="color: red;">KO</span> \| OK \| <span style="color: red;">KO</span> \| OK \|
	\| \| Killers \| OK \| OK \| <span style="color: red;">KO</span> \| OK \|
	\| \| Strawberry r's \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| \| 9.11 or 9.9 bigger \| <span style="color: red;">KO</span> \| OK \| OK \| <span style="color: red;">KO</span> \|
	\| \| Dragon or lens \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| \| Shirts \| OK \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| \| Sisters \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| \| Jane faster \| OK \| OK \| OK \| OK \|
	\| Programming \| JSON \| OK \| OK \| OK \| OK \|
	\| \| Python snake game \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| Math \| Door window combination \| OK \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| Smoke \| Poem \| OK \| OK \| OK \| OK \|
	\| \| Story \| OK \| OK \| KO \| OK \|

	Note: See [sample_generations.txt](https://huggingface.co/gbueno86/Cathallama-70B/blob/main/sample_generations.txt) on the main folder of the repo for the raw generations.

	MMLU-PRO

	\| Model \| Success % \|
	\| --- \| --- \|
	\| Cathallama-70B.Q4_0.gguf \| 51.0% \|
	\| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf \| 37.0% \|
	\| Nexusflow_Athene-70B.Q4_0.gguf \| 41.0% \|
	\| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf \| 42.0% \|

	\| MMLU-PRO category\| Cathallama-70B.Q4_0.gguf \| Nexusflow_Athene-70B.Q4_0.gguf \| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf \| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| Business \| 50.0% \| 45.0% \| 20.0% \| 40.0% \|
	\| Law \| 40.0% \| 30.0% \| 30.0% \| 35.0% \|
	\| Psychology \| 85.0% \| 80.0% \| 70.0% \| 75.0% \|
	\| Biology \| 80.0% \| 70.0% \| 85.0% \| 80.0% \|
	\| Chemistry \| 55.0% \| 40.0% \| 35.0% \| 35.0% \|
	\| History \| 65.0% \| 60.0% \| 55.0% \| 65.0% \|
	\| Other \| 55.0% \| 50.0% \| 45.0% \| 50.0% \|
	\| Health \| 75.0% \| 40.0% \| 60.0% \| 65.0% \|
	\| Economics \| 80.0% \| 75.0% \| 65.0% \| 70.0% \|
	\| Math \| 45.0% \| 35.0% \| 15.0% \| 40.0% \|
	\| Physics \| 50.0% \| 45.0% \| 45.0% \| 45.0% \|
	\| Computer Science \| 60.0% \| 55.0% \| 55.0% \| 60.0% \|
	\| Philosophy \| 55.0% \| 60.0% \| 45.0% \| 50.0% \|
	\| Engineering \| 35.0% \| 40.0% \| 25.0% \| 35.0% \|

	Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.

	PubmedQA

	Model Name \| Success% \|
	\| --- \| --- \|
	\| Cathallama-70B.Q4_0.gguf\| 73.00% \|
	\| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf \| 76.00% \|
	\| Nexusflow_Athene-70B.Q4_0.gguf \| 67.00% \|
	\| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf \| 72.00% \|


	Request
	--------------
	If you are hiring in the EU or can sponsor a visa, PM me :D


	PS. Thank you mradermacher for the GGUFs!

	---
	license: llama3.1
	language:
	- en
	library_name: transformers
	tags:
	- mergekit
	- merge
	base_model:
	- meta-llama/Meta-Llama-3.1-70B-Instruct
	- turboderp/Cat-Llama-3-70B-instruct
	- Nexusflow/Athene-70B
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/KxaiZ7rDKkYlix99O9j5H.png)

	Cathallama
	=====================================

	Awesome model, my new daily driver.

	Edit: I am seeing a lot of token generations pointing to unknown unicode addresses that didn't show up during testing for this model, so I have stopped using it and I am working on a new version.

	Notable Performance

	* 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0
	* Strong performance in MMLU-PRO categories overall
	* Great performance during manual testing

	Creation workflow
	=====================
	Models merged
	* meta-llama/Meta-Llama-3.1-70B-Instruct
	* turboderp/Cat-Llama-3-70B-instruct
	* Nexusflow/Athene-70B

	```
	flowchart TD
	A[Nexusflow_Athene] -->\|Merge with\| B[Meta-Llama-3.1]
	C[turboderp_Cat] -->\|Merge with\| D[Meta-Llama-3.1]
	B -->\| \| E[Merge]
	D -->\| \| E[Merge]
	E[Merge] -->\|Result\| F[Cathallama]
	```


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/bBcB194tAtsZjPUnI1pDQ.png)

	Testing
	=====================

	Hyperparameters
	---------------

	* Temperature: 0.0 for automated, 0.9 for manual
	* Penalize repeat sequence: 1.05
	* Consider N tokens for penalize: 256
	* Penalize repetition of newlines
	* Top-K sampling: 40
	* Top-P sampling: 0.95
	* Min-P sampling: 0.05

	LLaMAcpp Version
	------------------

	* b3527-2-g2d5dd7bb
	* -fa -ngl -1 -ctk f16 --no-mmap

	Tested Files
	------------------

	* Cathallama-70B.Q4_0.gguf
	* Nexusflow_Athene-70B.Q4_0.gguf
	* turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
	* Meta-Llama-3.1-70B-Instruct.Q4_0.gguf

	Tests
	--------------


	Manual testing

	\| Category \| Test Case \| Cathallama-70B.Q4_0.gguf \| Nexusflow_Athene-70B.Q4_0.gguf \| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf \| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| Common Sense \| Ball on cup \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| OK \|
	\| \| Big duck small horse \| <span style="color: red;">KO</span> \| OK \| <span style="color: red;">KO</span> \| OK \|
	\| \| Killers \| OK \| OK \| <span style="color: red;">KO</span> \| OK \|
	\| \| Strawberry r's \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| \| 9.11 or 9.9 bigger \| <span style="color: red;">KO</span> \| OK \| OK \| <span style="color: red;">KO</span> \|
	\| \| Dragon or lens \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| \| Shirts \| OK \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| \| Sisters \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| \| Jane faster \| OK \| OK \| OK \| OK \|
	\| Programming \| JSON \| OK \| OK \| OK \| OK \|
	\| \| Python snake game \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| Math \| Door window combination \| OK \| OK \| <span style="color: red;">KO</span> \| <span style="color: red;">KO</span> \|
	\| Smoke \| Poem \| OK \| OK \| OK \| OK \|
	\| \| Story \| OK \| OK \| KO \| OK \|

	Note: See [sample_generations.txt](https://huggingface.co/gbueno86/Cathallama-70B/blob/main/sample_generations.txt) on the main folder of the repo for the raw generations.

	MMLU-PRO

	\| Model \| Success % \|
	\| --- \| --- \|
	\| Cathallama-70B.Q4_0.gguf \| 51.0% \|
	\| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf \| 37.0% \|
	\| Nexusflow_Athene-70B.Q4_0.gguf \| 41.0% \|
	\| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf \| 42.0% \|

	\| MMLU-PRO category\| Cathallama-70B.Q4_0.gguf \| Nexusflow_Athene-70B.Q4_0.gguf \| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf \| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| Business \| 50.0% \| 45.0% \| 20.0% \| 40.0% \|
	\| Law \| 40.0% \| 30.0% \| 30.0% \| 35.0% \|
	\| Psychology \| 85.0% \| 80.0% \| 70.0% \| 75.0% \|
	\| Biology \| 80.0% \| 70.0% \| 85.0% \| 80.0% \|
	\| Chemistry \| 55.0% \| 40.0% \| 35.0% \| 35.0% \|
	\| History \| 65.0% \| 60.0% \| 55.0% \| 65.0% \|
	\| Other \| 55.0% \| 50.0% \| 45.0% \| 50.0% \|
	\| Health \| 75.0% \| 40.0% \| 60.0% \| 65.0% \|
	\| Economics \| 80.0% \| 75.0% \| 65.0% \| 70.0% \|
	\| Math \| 45.0% \| 35.0% \| 15.0% \| 40.0% \|
	\| Physics \| 50.0% \| 45.0% \| 45.0% \| 45.0% \|
	\| Computer Science \| 60.0% \| 55.0% \| 55.0% \| 60.0% \|
	\| Philosophy \| 55.0% \| 60.0% \| 45.0% \| 50.0% \|
	\| Engineering \| 35.0% \| 40.0% \| 25.0% \| 35.0% \|

	Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.

	PubmedQA

	Model Name \| Success% \|
	\| --- \| --- \|
	\| Cathallama-70B.Q4_0.gguf\| 73.00% \|
	\| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf \| 76.00% \|
	\| Nexusflow_Athene-70B.Q4_0.gguf \| 67.00% \|
	\| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf \| 72.00% \|


	Request
	--------------
	If you are hiring in the EU or can sponsor a visa, PM me :D


	PS. Thank you mradermacher for the GGUFs!