daisd-ai
/

anydef-v2-linear-W4A16

compressed-tensors

Model card Files Files and versions Community

anydef-v2-linear-W4A16 / README.md

arynkiewicz's picture

Update README.md

d19e769 verified 17 days ago

|

2.09 kB

	---
	base_model: daisd-ai/anydef-orpo-v2
	tags:
	- entity linking
	datasets:
	- arynkiewicz/anydef-kilt-tasks-v2
	model-index:
	- name: daisd-ai/anydef-v2-linear-W4A16
	results: []
	license: apache-2.0
	inference: false
	---

	## Introduction

	This model is quantized version of linear merge of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and [daisd-ai/anydef-orpo-v2](https://huggingface.co/daisd-ai/anydef-orpo-v2).

	## Merging

	Models were merged to improve quality of the final model ([idea](https://www.reddit.com/r/LocalLLaMA/comments/1fyx27y/im_pretty_happy_with_how_my_method_worked_out/)) and prevent huge losses during quantization. Merging was done using [mergekit](https://github.com/arcee-ai/mergekit) with following spec:
	```yaml
	models:
	- model: mistralai/Mistral-7B-v0.1
	parameters:
	weight: 0.3
	- model: daisd-ai/anydef-orpo-v2
	parameters:
	weight: 0.7
	merge_method: linear
	dtype: bfloat16
	```

	## Quantization

	The quantization was applied using [LLM Compressor](https://github.com/vllm-project/llm-compressor) with 512 random examples from [anydef-kilt-tasks-v2](https://huggingface.co/datasets/daisd-ai/anydef-kilt-tasks-v2) dataset.
	We tested other numbers of examples, but did not see noticeable improvement with higher number of examples during quantization.

	The recipe for quantization:
	```python
	recipe = [
	SmoothQuantModifier(smoothing_strength=0.8),
	GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"]),
	]
	```

	## Inference

	For inference code you can check our [github](https://github.com/daisd-ai/universal-el).

	## Benchmarks results

	Precision (%):
	\| Dataset \| anydef-v2 \| anydef-v2-quant (this) \|
	\|------------\|------------\|------------\|
	\| RSS-500 \| 66.89\| 64.90\|
	\| ISTEX-1000\| 85.82\| 84.33\|
	\| Reuters-128\| 64.88\| 68.28\|
	\| TweekiGold\| 75.93\| 75.93\|

	Retrieval rate (%):
	\| Dataset \| anydef-v2 \| anydef-v2-quant (this) \|
	\|------------\|------------\|------------\|
	\| RSS-500 \| 84.11\| 83.44\|
	\| ISTEX-1000\| 97.76\| 97.31\|
	\| Reuters-128\| 83.33\| 83.87\|
	\| TweekiGold\| 91.67\| 91.44\|