mgoin commited on
Commit
0bcd329
1 Parent(s): d262277

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -15,13 +15,13 @@ tags:
15
 
16
  # OpenHermes 2.5 Mistral 7B - DeepSparse
17
 
18
- This repo contains [DeepSparse](https://github.com/neuralmagic/deepsparse), a sparsity-aware CPU inference runtime, model files for [Teknium's OpenHermes 2.5 Mistral 7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B).
19
 
20
  This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
21
 
22
  ## Inference
23
 
24
- Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse):
25
  ```
26
  pip install deepsparse-nightly[llm]
27
  ```
@@ -52,7 +52,7 @@ That's a difficult question as there are many people who inspire me. However, on
52
 
53
  ## Sparsification
54
 
55
- See the `recipe.yaml` in this repo and follow the instructions below.
56
 
57
  ```bash
58
  git clone https://github.com/neuralmagic/sparseml
@@ -62,6 +62,7 @@ python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task t
62
  cp deployment/model.onnx deployment/model-orig.onnx
63
  ```
64
 
 
65
  ```python
66
  import os
67
  import onnx
 
15
 
16
  # OpenHermes 2.5 Mistral 7B - DeepSparse
17
 
18
+ This repo contains model files for [Teknium's OpenHermes 2.5 Mistral 7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
19
 
20
  This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
21
 
22
  ## Inference
23
 
24
+ Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
25
  ```
26
  pip install deepsparse-nightly[llm]
27
  ```
 
52
 
53
  ## Sparsification
54
 
55
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
56
 
57
  ```bash
58
  git clone https://github.com/neuralmagic/sparseml
 
62
  cp deployment/model.onnx deployment/model-orig.onnx
63
  ```
64
 
65
+ Run this kv-cache injection afterwards:
66
  ```python
67
  import os
68
  import onnx