--- base_model: daisd-ai/anydef-orpo-v2 tags: - entity linking datasets: - arynkiewicz/anydef-kilt-tasks-v2 model-index: - name: daisd-ai/anydef-v2-linear-W4A16 results: [] license: apache-2.0 inference: false --- ## Introduction This model is quantized version of linear merge of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and [daisd-ai/anydef-orpo-v2](https://huggingface.co/daisd-ai/anydef-orpo-v2). ## Merging Models were merged to improve quality of the final model ([idea](https://www.reddit.com/r/LocalLLaMA/comments/1fyx27y/im_pretty_happy_with_how_my_method_worked_out/)) and prevent huge losses during quantization. Merging was done using [mergekit](https://github.com/arcee-ai/mergekit) with following spec: ```yaml models: - model: mistralai/Mistral-7B-v0.1 parameters: weight: 0.3 - model: daisd-ai/anydef-orpo-v2 parameters: weight: 0.7 merge_method: linear dtype: bfloat16 ``` ## Quantization The quantization was applied using [LLM Compressor](https://github.com/vllm-project/llm-compressor) with 512 random examples from [anydef-kilt-tasks-v2](https://huggingface.co/datasets/daisd-ai/anydef-kilt-tasks-v2) dataset. We tested other numbers of examples, but did not see noticeable improvement with higher number of examples during quantization. The recipe for quantization: ```python recipe = [ SmoothQuantModifier(smoothing_strength=0.8), GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"]), ] ``` ## Inference For inference code you can check our [github](https://github.com/daisd-ai/universal-el). ## Benchmarks results Precision (%): | Dataset | anydef-v2 | anydef-v2-quant (this) | |------------|------------|------------| | RSS-500 | 66.89| 64.90| | ISTEX-1000| 85.82| 84.33| | Reuters-128| 64.88| 68.28| | TweekiGold| 75.93| 75.93| Retrieval rate (%): | Dataset | anydef-v2 | anydef-v2-quant (this) | |------------|------------|------------| | RSS-500 | 84.11| 83.44| | ISTEX-1000| 97.76| 97.31| | Reuters-128| 83.33| 83.87| | TweekiGold| 91.67| 91.44|