Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows platform support bringup #36

Open
ScottTodd opened this issue Jan 30, 2025 · 14 comments
Open

Windows platform support bringup #36

ScottTodd opened this issue Jan 30, 2025 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@ScottTodd
Copy link
Member

ScottTodd commented Jan 30, 2025

We're aiming to support as much of the toolkit as possible on Windows.

  • Our build target is "native" Windows (not WSL 1 or WSL 2), compiling using standard build tools (e.g. MSVC).
  • We should support local builds, have regular CI builds (GitHub Actions), and eventually be able to distribute artifacts.
  • We should also have any appropriate documentation, developer guides, etc. needed.

Initial experiments

Setup notes

Fetching sources

After cloning TheRock, I tried to checkout sources using python ./build_tools/fetch_sources.py. That needed the repo tool (https://gerrit.googlesource.com/git-repo/+/HEAD/README.md, docs at https://source.android.com/docs/setup/reference/repo), which is not distributed on Windows, so I downloaded the script manually. I did then have to edit build_tools/fetch_sources.py to exec python D:/path/to/repo instead of just repo. We can teach that script how to download the file on its own and run it in a portable way.

Configure and build

After fetching sources, I tried configuring and building with CMake (under VSCode):
[proc] Executing command: "C:\Program Files\CMake\bin\cmake.EXE" -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DTHEROCK_ENABLE_RCCL=OFF -DTHEROCK_ENABLE_MATH_LIBS=OFF -DTHEROCK_ENABLE_ML_FRAMEWORKS=OFF -DTHEROCK_AMDGPU_FAMILIES=gfx110X-dgpu -UTHEROCK_AMDGPU_TARGETS -UTHEROCK_AMDGPU_DIST_BUNDLE_NAME -DLLVM_BUILD_LLVM_DYLIB=OFF --no-warn-unused-cli -SD:/projects/TheRock -Bd:/projects/TheRock/build -G Ninja

I found a few Windows issues specific to TheRock:

Next, I'm finding issues in the subprojects that TheRock includes. Sample logs: https://gist.github.com/ScottTodd/0a6625c4502169c5d54535bf122fc7fd. I'm not sure how deep these issues go, and if some of them are fixed on development branches. Here are a few I've found so far:

@ScottTodd ScottTodd added the enhancement New feature or request label Jan 30, 2025
@ScottTodd
Copy link
Member Author

Got a build to succeed (run to completion with exit code 0) after disabling many subprojects and writing some local patches.

@amd-chrissosa
Copy link

For 3: git diff master Branch1 > ../patchfile and check that in should be fine. apply patches functionality seems pretty straightforward: https://github.com/nod-ai/TheRock/blob/main/build_tools/fetch_sources.py

@stellaraccident
Copy link
Contributor

stellaraccident commented Jan 31, 2025

FYI - for managing the patches directory, I've been making commits to the repo checked out git dirs. Then you dump them all with git format-patch. There's a handy (Linux) script in build_tools/save_patches.sh, but it can just be a one liner:

git format-patch -o ~/TheRock/patches/rocm-6.3.1/rocm-core rocm-6.3.1

Then if you re-run fetch_sources.py it will reset to the base tag (rocm-6.3.1) and re-apply the patches, same as a fresh checkout on the CI.

I was doing it the manual way via git diff but it was absolutely murder trying to upstream them and such. This way, you have a nice stack of commits in your git repo and can cherry-pick/rebase to a new branch to make an upstream PR, etc.

@stellaraccident
Copy link
Contributor

Sorry - I could have documented my flow on this better: key is that git format-patch is the inverse of git am (which is what fetch_sources.py does to re-apply)

@ScottTodd
Copy link
Member Author

Thanks, the save_patches.sh script worked fine for me (using git bash under cmder)

ScottTodd added a commit that referenced this issue Jan 31, 2025
Progress on #36. This makes it
easier to get set up on Windows, which is missing a package manager
distribution of the repo tool
(https://gerrit.googlesource.com/git-repo/). The tool is a standalone
python script that can be run portably via `python repo`. Linux users
will probably be more comfortable running it as an executable as just
`repo`, but that isn't possible on Windows without some tricks.

## Sample log snippets

Download (first run):
```
D:\projects\TheRock (windows-repo-download)
λ python ./build_tools/fetch_sources.py
Unable to find 'repo', downloading into script dir at D:\projects\TheRock\build_tools\repo
Setting up repo in D:\projects\TheRock\sources
++ Exec [D:\projects\TheRock\sources]$ 'C:\Program Files\Python312\python.exe' 'D:\projects\TheRock\build_tools\repo' init -v -u https://github.com/ROCm/ROCm.git -m tools/rocm-build/rocm-6.3.1.xml -b roc-6.3.x
repo: reusing existing repo client checkout in D:\projects\TheRock\sources
```

Reuse downloaded tool (future runs):
```
D:\projects\TheRock (windows-repo-download)
λ python ./build_tools/fetch_sources.py
Found 'repo' in script dir at D:\projects\TheRock\build_tools\repo, using it
Setting up repo in D:\projects\TheRock\sources
++ Exec [D:\projects\TheRock\sources]$ 'C:\Program Files\Python312\python.exe' 'D:\projects\TheRock\build_tools\repo' init -v -u https://github.com/ROCm/ROCm.git -m tools/rocm-build/rocm-6.3.1.xml -b roc-6.3.x
repo: reusing existing repo client checkout in D:\projects\TheRock\sources
```
@ScottTodd
Copy link
Member Author

ScottTodd commented Feb 1, 2025

Made some progress on pushing patches needed for a minimal local build. Now trying to get that wrapped into a GitHub Actions workflow, using this dev branch: https://github.com/ScottTodd/TheRock/tree/windows-ci-setup (workflow file is here: https://github.com/ScottTodd/TheRock/blob/windows-ci-setup/.github/workflows/build_windows_packages.yml). Here are some logs with a bunch of debug steps included: https://github.com/ScottTodd/TheRock/actions/runs/13082919002/job/36509861561.

Stuck for today on

Run cmake --build build
[0/2] Re-checking globbed directories...
ninja: error: 'D:/a/TheRock/TheRock/base/rocm-cmake/CMakeLists.txt', needed by 'compile_commands_fragment_rocm-cmake.json', missing and no known rule to make it
Error: Process completed with exit code 1.

I saw that locally when symlinks were not enabled in the git repository or the fetch_sources.py file was not run. I think the right files are on the runner, but I'm still suspicious of the symlink / hardlink configuration.

@stellaraccident
Copy link
Contributor

Made some progress on pushing patches needed for a minimal local build. Now trying to get that wrapped into a GitHub Actions workflow, using this dev branch: https://github.com/ScottTodd/TheRock/tree/windows-ci-setup (workflow file is here: https://github.com/ScottTodd/TheRock/blob/windows-ci-setup/.github/workflows/build_windows_packages.yml). Here are some logs with a bunch of debug steps included: https://github.com/ScottTodd/TheRock/actions/runs/13082919002/job/36509861561.

Stuck for today on

Run cmake --build build
[0/2] Re-checking globbed directories...
ninja: error: 'D:/a/TheRock/TheRock/base/rocm-cmake/CMakeLists.txt', needed by 'compile_commands_fragment_rocm-cmake.json', missing and no known rule to make it
Error: Process completed with exit code 1.

I saw that locally when symlinks were not enabled in the git repository or the fetch_sources.py file was not run. I think the right files are on the runner, but I'm still suspicious of the symlink / hardlink configuration.

That is a strange error: there should not be a direct dependency between those two files (the dependency crosses a couple of other commands). But I think this is cmake/ninja wonkiness where for custom targets, it makes them depend on the transitive set of input files. So there is probably a lot wrong, and you're just getting that one error out of the pile randomly.

One thing you could quickly try: on line 418 of therock_subproject.cmake, comment out the `"${_cmake_source_dir}/CMakeLists.txt" line. This will mean it won't retrigger on sub-project changes, but that is not a problem on CI.

Still feeling like there is something else wrong and this is just the error you are getting.

I can pull it down on my windows machine and try.

@ScottTodd
Copy link
Member Author

One thing you could quickly try: on line 418 of therock_subproject.cmake, comment out the `"${_cmake_source_dir}/CMakeLists.txt" line. This will mean it won't retrigger on sub-project changes, but that is not a problem on CI.

https://github.com/ScottTodd/TheRock/actions/runs/13122880778/job/36612781302#step:18:60

 [1/34] Configure sub-project therock-eigen (in background)
FAILED: third-party/eigen/stamp/configure.stamp third-party/eigen/build/CMakeCache.txt third-party/eigen/build/cmake_install.cmake compile_commands_fragment_therock-eigen.json D:/a/TheRock/TheRock/build/third-party/eigen/stamp/configure.stamp D:/a/TheRock/TheRock/build/third-party/eigen/build/CMakeCache.txt D:/a/TheRock/TheRock/build/third-party/eigen/build/cmake_install.cmake D:/a/TheRock/TheRock/build/compile_commands_fragment_therock-eigen.json 
C:\Windows\system32\cmd.exe /C "cd /D D:\a\TheRock\TheRock\build\third-party\eigen\build && "C:\Program Files\CMake\bin\cmake.exe" -GNinja -BD:/a/TheRock/TheRock/build/third-party/eigen/build -SD:/a/TheRock/TheRock/build/third-party/eigen/source -DCMAKE_INSTALL_PREFIX=D:/a/TheRock/TheRock/build/third-party/eigen/stage -DCMAKE_TOOLCHAIN_FILE=D:/a/TheRock/TheRock/build/third-party/eigen/_toolchain.cmake -DCMAKE_PROJECT_TOP_LEVEL_INCLUDES=D:/a/TheRock/TheRock/build/third-party/eigen/_init.cmake && "C:\Program Files\CMake\bin\cmake.exe" -E touch D:/a/TheRock/TheRock/build/third-party/eigen/build/compile_commands.json && "C:\Program Files\CMake\bin\cmake.exe" -E touch D:/a/TheRock/TheRock/build/third-party/eigen/stamp/configure.stamp && "C:\Program Files\CMake\bin\cmake.exe" -E copy D:/a/TheRock/TheRock/build/third-party/eigen/build/compile_commands.json D:/a/TheRock/TheRock/build/compile_commands_fragment_therock-eigen.json"
CMake Error: The source directory "D:/a/TheRock/TheRock/build/third-party/eigen/source" does not appear to contain CMakeLists.txt.

Similar error...

@stellaraccident
Copy link
Contributor

It seems to be saying pretty insistently that the file is not there. I've got nothing... The file must be not really there somehow

@ScottTodd
Copy link
Member Author

🤦 okay, I can repro those eigen sources missing locally now. Probably some artifact of disabling subprojects that failed to build and that resulting in an incomplete build directory. As I was testing locally I configured a few times with different settings.

The dependency that you pointed me to might still need some tweaks:

DEPENDS
"${_cmake_source_dir}/CMakeLists.txt"

At least now I can debug a bit locally.

@ScottTodd
Copy link
Member Author

Double 🤦 , the eigen error was caused by removing that dependency link. It is load bearing. So I have local builds working reliably, but the GitHub-hosted runners are still having trouble seeing files, perhaps due to symlinks. Might be time to try using our runner cluster instead.

@ScottTodd
Copy link
Member Author

One clue with the GitHub hosted runners is that there are two drives in use, C:/ and D:/

-- Found Python3: C:/hostedtoolcache/windows/Python/3.11.9/x64/python3.exe (found suitable version "3.11.9", minimum required is "3.9") found components: Interpreter
-- Including subproject therock-boost (from D:/a/TheRock/TheRock/third-party/boost/cmake_project)
--   PROVIDE Boost = lib/cmake/Boost-1.87.0 (from therock-boost)
--   PROVIDE boost_atomic = lib/cmake/boost_atomic-1.87.0 (from therock-boost)
--   PROVIDE boost_filesystem = lib/cmake/boost_filesystem-1.87.0 (from therock-boost)
--   PROVIDE boost_headers = lib/cmake/boost_headers-1.87.0 (from therock-boost)
--   PROVIDE boost_system = lib/cmake/boost_system-1.87.0 (from therock-boost)
--   CONFIGURE_DEPENDS:  
--   JOB_POOL: therock_background
-- Including subproject therock-eigen (from D:/a/TheRock/TheRock/build/third-party/eigen/source)

This happens in other projects too. Maybe our code in https://github.com/nod-ai/TheRock/blob/main/cmake/therock_subproject.cmake is somehow not handling that on Windows? I can test that theory locally... thus far I've only been using a single D:/ drive.

@stellaraccident
Copy link
Contributor

stellaraccident commented Feb 4, 2025

Something is not right about that log: the PROVIDE paths should be absolute, not relative. Depending on what the current drive is, that could absolutely be a problem.

(edit: scratch that. PROVIDE should be relative. INJECT should be absolute)

@ScottTodd
Copy link
Member Author

A few updates:

  1. I got a successful run on GitHub Actions using standard GitHub hosted runners here: https://github.com/ScottTodd/TheRock/actions/runs/13142910967/job/36674106275 . I had to replace the symlinks with direct copies of the relevant files:

          - name: Replace symlinks with copies
            run: |
              rm base/half
              rm base/rocm-cmake
              rm base/rocm-core
    
              rm -rf sources/half/.git
              rm -rf sources/rocm-cmake/.git
              rm -rf sources/rocm-core/.git
    
              cp -r sources/half base
              cp -r sources/rocm-cmake base
              cp -r sources/rocm-core base
  2. I started testing using our larger self-hosted Windows runners here: https://github.com/nod-ai/TheRock/actions/runs/13124975600/job/36619337677. Hit a SSL: CERTIFICATE_VERIFY_FAILED error trying to download repo. Can make more progress there by checking the file in to third-party instead of downloading on demand, or can try mirroring to a location with a different network/auth setup.

  3. I also tested a bit on my other Windows machine (Windows 10) and found that fetching the files outside of a dev drive is noticeably slower on the same network. My setup there also ran into some bumps with the patching (due to my SSH key git signing config...), so if we could move to git submodules with forks that carry patches natively instead of using repo + patch files that would simplify quite a bit.

ScottTodd added a commit that referenced this issue Feb 7, 2025
Progress on #36, making a larger
portion of https://github.com/ROCm/rocm-core/ compile on Windows.

## About the changes

* The `link.h` and `dlfcn.h` headers do not exist on Windows and are
only used when `BUILD_SHARED_LIBS` is set.
* The `PATH_MAX` value, defined in `limits.h`, does not exist on
Windows. I opted to use a fixed constant value of `4096`, but
`FILENAME_MAX` is also an option (see
https://stackoverflow.com/a/65174437).
* Attributes like `__attribute__((visibility("default")))` do not exist
on all compilers. Added some boilerplate cross platform versions
(different approaches are possible too, this is just what I use on other
projects).

## How I generated the patch

1. Made changes in the source folder
2. Committed to a branch
(ScottTodd/rocm-core@0dd798f)
3. Ran
    
    ```
    bash .\build_tools\save_patches.sh rocm-6.3.1 rocm-core
    ```
ScottTodd added a commit that referenced this issue Feb 7, 2025
Progress on #36.

Together with some other pending PRs, this gets me enough to

1. Fetch sources
2. Configure with CMake
3. Build without errors

Not much is actually getting built, since this disables nearly all
projects on Windows, but this is enough to start running CI builds and
working on components incrementally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants