-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows platform support bringup #36
Comments
Got a build to succeed (run to completion with exit code 0) after disabling many subprojects and writing some local patches.
|
For 3: git diff master Branch1 > ../patchfile and check that in should be fine. apply patches functionality seems pretty straightforward: https://github.com/nod-ai/TheRock/blob/main/build_tools/fetch_sources.py |
FYI - for managing the patches directory, I've been making commits to the repo checked out git dirs. Then you dump them all with
Then if you re-run I was doing it the manual way via |
Sorry - I could have documented my flow on this better: key is that |
Thanks, the |
Progress on #36. This makes it easier to get set up on Windows, which is missing a package manager distribution of the repo tool (https://gerrit.googlesource.com/git-repo/). The tool is a standalone python script that can be run portably via `python repo`. Linux users will probably be more comfortable running it as an executable as just `repo`, but that isn't possible on Windows without some tricks. ## Sample log snippets Download (first run): ``` D:\projects\TheRock (windows-repo-download) λ python ./build_tools/fetch_sources.py Unable to find 'repo', downloading into script dir at D:\projects\TheRock\build_tools\repo Setting up repo in D:\projects\TheRock\sources ++ Exec [D:\projects\TheRock\sources]$ 'C:\Program Files\Python312\python.exe' 'D:\projects\TheRock\build_tools\repo' init -v -u https://github.com/ROCm/ROCm.git -m tools/rocm-build/rocm-6.3.1.xml -b roc-6.3.x repo: reusing existing repo client checkout in D:\projects\TheRock\sources ``` Reuse downloaded tool (future runs): ``` D:\projects\TheRock (windows-repo-download) λ python ./build_tools/fetch_sources.py Found 'repo' in script dir at D:\projects\TheRock\build_tools\repo, using it Setting up repo in D:\projects\TheRock\sources ++ Exec [D:\projects\TheRock\sources]$ 'C:\Program Files\Python312\python.exe' 'D:\projects\TheRock\build_tools\repo' init -v -u https://github.com/ROCm/ROCm.git -m tools/rocm-build/rocm-6.3.1.xml -b roc-6.3.x repo: reusing existing repo client checkout in D:\projects\TheRock\sources ```
Made some progress on pushing patches needed for a minimal local build. Now trying to get that wrapped into a GitHub Actions workflow, using this dev branch: https://github.com/ScottTodd/TheRock/tree/windows-ci-setup (workflow file is here: https://github.com/ScottTodd/TheRock/blob/windows-ci-setup/.github/workflows/build_windows_packages.yml). Here are some logs with a bunch of debug steps included: https://github.com/ScottTodd/TheRock/actions/runs/13082919002/job/36509861561. Stuck for today on
I saw that locally when symlinks were not enabled in the git repository or the fetch_sources.py file was not run. I think the right files are on the runner, but I'm still suspicious of the symlink / hardlink configuration. |
That is a strange error: there should not be a direct dependency between those two files (the dependency crosses a couple of other commands). But I think this is cmake/ninja wonkiness where for custom targets, it makes them depend on the transitive set of input files. So there is probably a lot wrong, and you're just getting that one error out of the pile randomly. One thing you could quickly try: on line 418 of therock_subproject.cmake, comment out the `"${_cmake_source_dir}/CMakeLists.txt" line. This will mean it won't retrigger on sub-project changes, but that is not a problem on CI. Still feeling like there is something else wrong and this is just the error you are getting. I can pull it down on my windows machine and try. |
https://github.com/ScottTodd/TheRock/actions/runs/13122880778/job/36612781302#step:18:60
Similar error... |
It seems to be saying pretty insistently that the file is not there. I've got nothing... The file must be not really there somehow |
🤦 okay, I can repro those eigen sources missing locally now. Probably some artifact of disabling subprojects that failed to build and that resulting in an incomplete build directory. As I was testing locally I configured a few times with different settings. The dependency that you pointed me to might still need some tweaks: TheRock/cmake/therock_subproject.cmake Lines 430 to 431 in b3307ed
At least now I can debug a bit locally. |
Double 🤦 , the eigen error was caused by removing that dependency link. It is load bearing. So I have local builds working reliably, but the GitHub-hosted runners are still having trouble seeing files, perhaps due to symlinks. Might be time to try using our runner cluster instead. |
One clue with the GitHub hosted runners is that there are two drives in use,
This happens in other projects too. Maybe our code in https://github.com/nod-ai/TheRock/blob/main/cmake/therock_subproject.cmake is somehow not handling that on Windows? I can test that theory locally... thus far I've only been using a single |
Something is not right about that log: the PROVIDE paths should be absolute, not relative. Depending on what the current drive is, that could absolutely be a problem. (edit: scratch that. PROVIDE should be relative. INJECT should be absolute) |
A few updates:
|
Progress on #36, making a larger portion of https://github.com/ROCm/rocm-core/ compile on Windows. ## About the changes * The `link.h` and `dlfcn.h` headers do not exist on Windows and are only used when `BUILD_SHARED_LIBS` is set. * The `PATH_MAX` value, defined in `limits.h`, does not exist on Windows. I opted to use a fixed constant value of `4096`, but `FILENAME_MAX` is also an option (see https://stackoverflow.com/a/65174437). * Attributes like `__attribute__((visibility("default")))` do not exist on all compilers. Added some boilerplate cross platform versions (different approaches are possible too, this is just what I use on other projects). ## How I generated the patch 1. Made changes in the source folder 2. Committed to a branch (ScottTodd/rocm-core@0dd798f) 3. Ran ``` bash .\build_tools\save_patches.sh rocm-6.3.1 rocm-core ```
Progress on #36. Together with some other pending PRs, this gets me enough to 1. Fetch sources 2. Configure with CMake 3. Build without errors Not much is actually getting built, since this disables nearly all projects on Windows, but this is enough to start running CI builds and working on components incrementally.
We're aiming to support as much of the toolkit as possible on Windows.
Initial experiments
Setup notes
git from https://git-scm.com/downloads (latest version
2.45.1
) then enabled symlinks withBuild Tools for Visual Studio 2022 from https://visualstudio.microsoft.com/downloads/ (latest version
17.12.4
)CMake from https://cmake.org/download/ (latest, version
3.31.5
)Ninja as a CMake generator from https://ninja-build.org/ (latest, version
1.12.1
)ccache from https://ccache.dev/ (latest, version
4.10.2
)Fetching sources
After cloning TheRock, I tried to checkout sources using
python ./build_tools/fetch_sources.py
. That needed the repo tool (https://gerrit.googlesource.com/git-repo/+/HEAD/README.md, docs at https://source.android.com/docs/setup/reference/repo), which is not distributed on Windows, so I downloaded the script manually. I did then have to editbuild_tools/fetch_sources.py
to execpython D:/path/to/repo
instead of justrepo
. We can teach that script how to download the file on its own and run it in a portable way.Configure and build
After fetching sources, I tried configuring and building with CMake (under VSCode):
[proc] Executing command: "C:\Program Files\CMake\bin\cmake.EXE" -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DTHEROCK_ENABLE_RCCL=OFF -DTHEROCK_ENABLE_MATH_LIBS=OFF -DTHEROCK_ENABLE_ML_FRAMEWORKS=OFF -DTHEROCK_AMDGPU_FAMILIES=gfx110X-dgpu -UTHEROCK_AMDGPU_TARGETS -UTHEROCK_AMDGPU_DIST_BUNDLE_NAME -DLLVM_BUILD_LLVM_DYLIB=OFF --no-warn-unused-cli -SD:/projects/TheRock -Bd:/projects/TheRock/build -G Ninja
I found a few Windows issues specific to TheRock:
Next, I'm finding issues in the subprojects that TheRock includes. Sample logs: https://gist.github.com/ScottTodd/0a6625c4502169c5d54535bf122fc7fd. I'm not sure how deep these issues go, and if some of them are fixed on development branches. Here are a few I've found so far:
rocm-core
__attribute__
is not portable to all compilers (specifically MSVC) in https://github.com/ROCm/rocm-core/blob/master/rocm_version.h.in, do something likeNeed a similar fix for
__attribute__((nonnull))
, maybe_Ret_nonnull
(https://stackoverflow.com/a/78737973)rocm-core
link.h
anddlfcn.h
headers are Unix-only in https://github.com/ROCm/rocm-core/blob/master/rocm_getpath.cpp (use something likeLoadLibraryA()
on a.dll
instead ofdlopen()
on a.so
). The headers should also only be included ifBUILD_SHARED_LIBS
is defined (see next bullet)-DBUILD_SHARED_LIBS=OFF
here helps a bit:TheRock/base/CMakeLists.txt
Line 22 in 96ebf46
rocm-core in the same file uses
PATH_MAX
which is nonstandard. There is aFILENAME_MAX
that is standard but https://stackoverflow.com/a/65174437 says it is not to be trusted for memory allocation. Could do something likemin(FILENAME_MAX, 4096)
(choosing some sensible value that is notINT_MAX
)llvm warns about
Generating libLLVM is not supported on MSVC
in https://github.com/ROCm/llvm-project/blob/amd-staging/llvm/tools/llvm-shlib/CMakeLists.txt so I set-DLLVM_BUILD_LLVM_DYLIB=OFF
in my CMake configure command. Not sure if that was load bearing for anything else yet. Could auto-detect and flip that off by default in TheRock or one of the subprojects that includes it.rocm_smi_lib sets
CMAKE_CXX_FLAGS
that assume gcc/clang: https://github.com/ROCm/rocm_smi_lib/blob/e5c5a1d5b7cba41ab801a987fb6c466503411804/CMakeLists.txt#L79-L85. It seems like https://github.com/ROCm/amdsmi is the successor to rocm_smi, but it has the same issue: https://github.com/ROCm/amdsmi/blob/9872083b491c9b136496fd493414aca5c6c144cb/CMakeLists.txt#L99-L104. Both projects also only claim to support Linux.The text was updated successfully, but these errors were encountered: