sahilsuneja commited on
Commit
30917ac
1 Parent(s): 057ee41

speedup update

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -5,6 +5,7 @@ license: apache-2.0
5
  ## Description
6
 
7
  This model is intended to be used as an accelerator for [Granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) and takes inspiration from the Medusa speculative decoding architecture.
 
8
  This accelerator modifies the MLP into a multi-stage MLP, where each stage predicts
9
  a single token in the draft based on both a state vector and sampled token
10
  from the prior stage (the base model can be considered stage 0).
 
5
  ## Description
6
 
7
  This model is intended to be used as an accelerator for [Granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) and takes inspiration from the Medusa speculative decoding architecture.
8
+ Preliminary evaluations show up to a 2.2x speedup in tokens/step when using the accelerator with the base model.
9
  This accelerator modifies the MLP into a multi-stage MLP, where each stage predicts
10
  a single token in the draft based on both a state vector and sampled token
11
  from the prior stage (the base model can be considered stage 0).