Skip to content

Conversation

tinglvv
Copy link
Contributor

@tinglvv tinglvv commented Sep 2, 2025

Copy link

vercel bot commented Sep 2, 2025

@tinglvv is attempting to deploy a commit to the Meta Open Source Team on Vercel.

A member of the Team first needs to authorize it.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 2, 2025
@tinglvv tinglvv changed the title Add Windows 13.0 build Add Windows 13.0 domain builds Sep 2, 2025
@atalman
Copy link
Contributor

atalman commented Sep 3, 2025

@tinglvv I believe need to run python -m tools.tests.test_generate_binary_build_matrix --update-reference-files

@tinglvv
Copy link
Contributor Author

tinglvv commented Sep 3, 2025

Windows nightly build is still not available as of 9/3 as the commit does not include the windows build PR yet pytorch/pytorch@aa0545f, therefore causing the build failure. But we should be good to merge this PR despite the failure.

@atalman
Copy link
Contributor

atalman commented Sep 5, 2025

This is the error:

>>> from torch.utils.cpp_extension import BuildExtension
Windows fatal exception: access violation

Current thread 0x000015d8 (most recent call first):
  File "C:\Jenkins\Miniconda3\envs\py310\lib\site-packages\torch\cuda\__init__.py", line 182 in is_available
  File "C:\Jenkins\Miniconda3\envs\py310\lib\site-packages\torch\utils\cpp_extension.py", line 116 in _find_cuda_home
  File "C:\Jenkins\Miniconda3\envs\py310\lib\site-packages\torch\utils\cpp_extension.py", line 235 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<stdin>", line 1 in <module>

Digging deeper:

>>> import torch
>>> import faulthandler
>>> faulthandler.enable()
>>> torch.cuda.is_available()
Windows fatal exception: access violation

Current thread 0x00001980 (most recent call first):
  File "C:\Jenkins\Miniconda3\envs\py310\lib\site-packages\torch\cuda\__init__.py", line 182 in is_available
  File "<stdin>", line 1 in <module>

I believe this should work no matter cpu or gpu machine:

>>> import torch
/home/ec2-user/github/variant-repack/.venv/lib/python3.13/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
>>> torch.__version__
'2.8.0+cu128'
>>> torch.cuda.is_available()
False

Works with cu128 build on Windows:

Python 3.10.18 | packaged by Anaconda, Inc. | (main, Jun  5 2025, 13:08:55) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False

Link to nightly smoke tests: https://github.com/pytorch/pytorch/actions/runs/17486852873/job/49681168445

@ptrblck ptrblck moved this to In Progress in PyTorch + CUDA Sep 5, 2025
@atalman
Copy link
Contributor

atalman commented Sep 5, 2025

Digging deeper looks like failure when:

(py310) C:\actions-runner\_work\vision>python
Python 3.10.18 | packaged by Anaconda, Inc. | (main, Jun  5 2025, 13:08:55) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> hasattr(torch._C, "_cuda_getDeviceCount")
True
>>> torch._C._cuda_getDeviceCount()

@atalman
Copy link
Contributor

atalman commented Sep 5, 2025

Error from windows log

Faulting application name: python.exe, version: 3.10.18150.1013, time stamp: 0x6841977c
Faulting module name: c10_cuda.dll, version: 0.0.0.0, time stamp: 0x68baa4b4
Exception code: 0xc0000005
Fault offset: 0x00000000000015d3
Faulting process id: 0x1908
Faulting application start time: 0x01dc1e9d4857191a
Faulting application path: C:\Jenkins\Miniconda3\envs\py310\python.exe
Faulting module path: C:\Jenkins\Miniconda3\envs\py310\lib\site-packages\torch\lib\c10_cuda.dll
Report Id: 98017187-f664-4643-9a6e-5bbd05d1084c
Faulting package full name: 
Faulting package-relative application ID: 

@atalman
Copy link
Contributor

atalman commented Sep 5, 2025

Same issue with GPU machine and Nvidia driver:

 NVIDIA-SMI 528.89       Driver Version: 528.89       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4           TCC   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants