Shangming Cai
commited on
Commit
•
92449b3
1
Parent(s):
2cf2e83
update README.md
Browse files
README.md
CHANGED
@@ -67,6 +67,10 @@ cd flash-attention && pip install .
|
|
67 |
# pip install csrc/layer_norm
|
68 |
# pip install csrc/rotary
|
69 |
```
|
|
|
|
|
|
|
|
|
70 |
<br>
|
71 |
|
72 |
|
|
|
67 |
# pip install csrc/layer_norm
|
68 |
# pip install csrc/rotary
|
69 |
```
|
70 |
+
|
71 |
+
如果您有更高推理性能方面的需求,但上述可选加速项`layer_norm`及`rotary`未能安装成功,或是您所使用的GPU不满足`flash-attention`库所要求的NVIDIA Ampere/Ada/Hopper架构,您可以尝试切换至dev_triton分支,使用该分支下基于Triton实现的推理加速方案。该方案适用于更宽范围的GPU产品,在pytorch 2.0及以上版本原生支持,无需额外安装操作。
|
72 |
+
|
73 |
+
If you require higher inference performance yet encounter some problems when installing the optional acceleration features (i.e., `layer_norm` and `rotary`) or if the GPU you are using does not meet the NVIDIA Ampere/Ada/Hopper architecture required by the `flash-attention` library, you may switch to the dev_triton branch and consider trying the inference acceleration solution implemented with Triton in this branch. This solution adapts to a wider range of GPU products and does not require extra package installation with pytorch version 2.0 and above.
|
74 |
<br>
|
75 |
|
76 |
|