ibm-granite
/

granite-3.0-8b-instruct-accelerator

Model card Files Files and versions Community

sahilsuneja commited on Oct 16

Commit

30917ac

•

1 Parent(s): 057ee41

speedup update

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ license: apache-2.0
 ## Description
 This model is intended to be used as an accelerator for [Granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) and takes inspiration from the Medusa speculative decoding architecture.
 This accelerator modifies the MLP into a multi-stage MLP, where each stage predicts
 a single token in the draft based on both a state vector and sampled token
 from the prior stage (the base model can be considered stage 0).

 ## Description
 This model is intended to be used as an accelerator for [Granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) and takes inspiration from the Medusa speculative decoding architecture.
+Preliminary evaluations show up to a 2.2x speedup in tokens/step when using the accelerator with the base model.
 This accelerator modifies the MLP into a multi-stage MLP, where each stage predicts
 a single token in the draft based on both a state vector and sampled token
 from the prior stage (the base model can be considered stage 0).