sahilsuneja
commited on
Commit
•
30917ac
1
Parent(s):
057ee41
speedup update
Browse files
README.md
CHANGED
@@ -5,6 +5,7 @@ license: apache-2.0
|
|
5 |
## Description
|
6 |
|
7 |
This model is intended to be used as an accelerator for [Granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) and takes inspiration from the Medusa speculative decoding architecture.
|
|
|
8 |
This accelerator modifies the MLP into a multi-stage MLP, where each stage predicts
|
9 |
a single token in the draft based on both a state vector and sampled token
|
10 |
from the prior stage (the base model can be considered stage 0).
|
|
|
5 |
## Description
|
6 |
|
7 |
This model is intended to be used as an accelerator for [Granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) and takes inspiration from the Medusa speculative decoding architecture.
|
8 |
+
Preliminary evaluations show up to a 2.2x speedup in tokens/step when using the accelerator with the base model.
|
9 |
This accelerator modifies the MLP into a multi-stage MLP, where each stage predicts
|
10 |
a single token in the draft based on both a state vector and sampled token
|
11 |
from the prior stage (the base model can be considered stage 0).
|