Windows wheels of flash-attention
Build cuda wheel steps
- First clone code
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention
- Switch tag branch, such as
v2.7.0.post2
(you can get latest tag bygit describe --tags
or list all available tags bygit tag -l
)
git checkout -b v2.7.0.post2 v2.7.0.post2
Download WindowsWhlBuilder_cuda.bat into
flash-attention
To build with MSVC, please open the "Native Tools Command Prompt for Visual Studio". The exact name may depend on your version of Windows, Visual Studio, and cpu architecture (in my case it was "x64 Native Tools Command Prompt for VS 2022".)
My Visual Studio Installer version
Switch python env and make sure the corresponding torch cuda version is installed
Start task
# Build with 1 parallel workers (I used 8 workers on i9-14900KF-3.20GHz-RAM64G, which took about 30 minutes.)
# If you want reset workers, you should edit `WindowsWhlBuilder_cuda.bat` and modify `set MAX_JOBS=1`.(I tried to modify it by parameters, but failed)
WindowsWhlBuilder_cuda.bat
# Enable cxx11abi
WindowsWhlBuilder_cuda.bat FORCE_CXX11_ABI=TRUE
- Wheel file will be placed in the
dist
directory