-
Notifications
You must be signed in to change notification settings - Fork 24k
Illegal Instruction Caused by grid_sample
Under Windows
#152385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Linked to existing issue: #145702 |
Thanks @xuhancn but I'm not sure this is the same issue. I tried the example code from #145702 and couldn't reproduce the error with the same setup that causes it for my example. This seems to be a very recent problem that is specific to PyTorch 2.7 as well which could be explained by switching to VS2022 for that build. |
Thanks for reply, I will do more debug work for your issue. |
Hi @ericspod I have debugged the pytorch release 2.7 binary, this issue is as same as my linked issue: #145702 ![]() You can compare the snapshot. Both issues are AVX512 instruction run on AVX2 machine. Reason: |
Hi @atalman , I built and run the code locally, the issue is not occurred. ![]() The reason is PyTorch official VS2022 build environment issue, as I commented in #145702 (comment) |
I think VS2022 upgrade is a misnomer, i.e. in at least in CI jobs are still running with VS2019 Few things to investigate:
|
Actually, I'm tracking for this issue for long time. Let me go though this issue by time line.
This issue is not detected by CI is due to its crashed by AVX512 instruction is genarated in AVX2 path, it is only occurred on |
@xuhancn can you just propose the PR on trunk that rolls back to VS2019 and validate the binary and then we'll cherry-pick it to 2.7.1? (Also, do you know if it's possible to reserve non-AVX512 capable machine on AWS? Or somehow simulate the failure on a more modern hardware) |
@xuhancn yes we can open a PR to switch to VS 2019 and regenerate the binary for testing |
Hi @atalman I have downloaded And then, I tested its wheel on my Intel 12th Core CPU. The test result is pass. ![]() CC: @malfet |
Hi @xuhancn @atalman, I have installed the nightly wheel for Python 3.12 (sha sha256:f1fb15bba2b1a3bbfdc99e01830df145c27037bfd4c52474a1bc2cb1745ca452) into the environment which sees the illegal instruction, and this is now not occurring. As what's seen above, the change here seems to have resolved the issue. |
🐛 Describe the bug
In Windows 10, Python 3.12.9, Pytorch 2.7.0+cu118, CUDA 12.2, the following code produces an "illegal instruction" causing an immediate crash:
This is specific to float64 tensors, float32 tensor format for both src and grid allow this function to execute correctly.
This issue with the current version of PyTorch is the source of CI/CD failures using Github Windows runners and seen in this PR. These tests fail with PyTorch 2.7 specifically, previous versions do not exhibit this issue.
Output from
/proc/cpuinfo
in case any more detail is relevant:Versions
Collecting environment information...
PyTorch version: 2.7.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro (10.0.19045 64-bit)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:49:16) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 536.23
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Name: AMD Ryzen 9 5900X 12-Core Processor
Manufacturer: AuthenticAMD
Family: 107
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 3701
MaxClockSpeed: 3701
L2CacheSize: 6144
L2CacheSpeed: None
Revision: 8450
Versions of relevant libraries:
[pip3] flake8==7.2.0
[pip3] flake8-bugbear==24.2.6
[pip3] flake8-comprehensions==3.16.0
[pip3] mypy==1.11.2
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.5
[pip3] onnx==1.17.0
[pip3] onnx_graphsurgeon==0.5.8
[pip3] pytorch-ignite==0.4.11
[pip3] torch==2.7.0+cu118
[pip3] torchio==0.20.7
[pip3] torchvision==0.22.0
[conda] numpy 2.2.5 pypi_0 pypi
[conda] pytorch-ignite 0.4.11 pypi_0 pypi
[conda] torch 2.7.0+cu118 pypi_0 pypi
[conda] torchio 0.20.7 pypi_0 pypi
[conda] torchvision 0.22.0 pypi_0 pypi
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168
The text was updated successfully, but these errors were encountered: