Using accelerating libraries such DeepSpeed
#8
by
marccasals
- opened
I am trying to load the model using the deepspeed library. Is it possible to optimize this model using this library? I have tried setting
replace_with_kernel_inject=True
But it duplicated the amount of GPU Ram needed. Is there any solution?
When evaluating with lm-eval-harness, it seems that the model does support using accelerate. After all this model should be fully compatible with LLaMA so any inference tricks for LLaMA should apply to OpenLLaMA
young-geng
changed discussion status to
closed