Skip to content

Illegal Instruction Caused by grid_sample Under Windows #152385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ericspod opened this issue Apr 28, 2025 · 14 comments
Open

Illegal Instruction Caused by grid_sample Under Windows #152385

ericspod opened this issue Apr 28, 2025 · 14 comments
Assignees
Labels
high priority module: cpu CPU specific problem (e.g., perf, algorithm) module: regression It used to work, and now it doesn't module: windows Windows support for PyTorch triage review
Milestone

Comments

@ericspod
Copy link

ericspod commented Apr 28, 2025

🐛 Describe the bug

In Windows 10, Python 3.12.9, Pytorch 2.7.0+cu118, CUDA 12.2, the following code produces an "illegal instruction" causing an immediate crash:

import torch
src = torch.rand((1, 1, 128, 64), dtype=torch.float64)
grid = torch.rand((1, 256, 256, 2), dtype=torch.float64)
dst = nn.functional.grid_sample(
    input=src.contiguous(),
    grid=grid,
    mode="bilinear",
    padding_mode="border",
    align_corners=False
)

This is specific to float64 tensors, float32 tensor format for both src and grid allow this function to execute correctly.

This issue with the current version of PyTorch is the source of CI/CD failures using Github Windows runners and seen in this PR. These tests fail with PyTorch 2.7 specifically, previous versions do not exhibit this issue.

Output from /proc/cpuinfo in case any more detail is relevant:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 33
model name      : AMD Ryzen 9 5900X 12-Core Processor
stepping        : 2
microcode       : 0xA20120A
cpu MHz         : 3700.000
cache size      : 65536 KB
physical id     : 0
siblings        : 24
core id         : 0
cpu cores       : 24
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 17
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmuldq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw wdt topoext cpb hw_pstate ibrs ibpb stibp fsgsbase bmi1 avx2 smep bmi2 erms cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr overflow_recov succor smca
bogomips        : 7400.00
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp hwpstate cpb eff_freq_ro

Versions

Collecting environment information...
PyTorch version: 2.7.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro (10.0.19045 64-bit)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:49:16) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 536.23
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Name: AMD Ryzen 9 5900X 12-Core Processor
Manufacturer: AuthenticAMD
Family: 107
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 3701
MaxClockSpeed: 3701
L2CacheSize: 6144
L2CacheSpeed: None
Revision: 8450

Versions of relevant libraries:
[pip3] flake8==7.2.0
[pip3] flake8-bugbear==24.2.6
[pip3] flake8-comprehensions==3.16.0
[pip3] mypy==1.11.2
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.5
[pip3] onnx==1.17.0
[pip3] onnx_graphsurgeon==0.5.8
[pip3] pytorch-ignite==0.4.11
[pip3] torch==2.7.0+cu118
[pip3] torchio==0.20.7
[pip3] torchvision==0.22.0
[conda] numpy 2.2.5 pypi_0 pypi
[conda] pytorch-ignite 0.4.11 pypi_0 pypi
[conda] torch 2.7.0+cu118 pypi_0 pypi
[conda] torchio 0.20.7 pypi_0 pypi
[conda] torchvision 0.22.0 pypi_0 pypi

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168

@jerryzh168 jerryzh168 added high priority module: regression It used to work, and now it doesn't labels Apr 29, 2025
@malfet malfet added needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user module: windows Windows support for PyTorch module: cpu CPU specific problem (e.g., perf, algorithm) and removed needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user labels Apr 29, 2025
@xuhancn
Copy link
Collaborator

xuhancn commented Apr 29, 2025

Linked to existing issue: #145702

@ericspod
Copy link
Author

Thanks @xuhancn but I'm not sure this is the same issue. I tried the example code from #145702 and couldn't reproduce the error with the same setup that causes it for my example. This seems to be a very recent problem that is specific to PyTorch 2.7 as well which could be explained by switching to VS2022 for that build.

@atalman atalman added this to the 2.7.1 milestone Apr 29, 2025
@xuhancn
Copy link
Collaborator

xuhancn commented Apr 30, 2025

Thanks @xuhancn but I'm not sure this is the same issue. I tried the example code from #145702 and couldn't reproduce the error with the same setup that causes it for my example. This seems to be a very recent problem that is specific to PyTorch 2.7 as well which could be explained by switching to VS2022 for that build.

Thanks for reply, I will do more debug work for your issue.

@xuhancn xuhancn self-assigned this Apr 30, 2025
@xuhancn
Copy link
Collaborator

xuhancn commented Apr 30, 2025

Hi @ericspod

I have debugged the pytorch release 2.7 binary, this issue is as same as my linked issue: #145702

Image

You can compare the snapshot. Both issues are AVX512 instruction run on AVX2 machine.

Reason: ymm21 is AVX512 register, ref: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions.

@xuhancn
Copy link
Collaborator

xuhancn commented Apr 30, 2025

Hi @atalman ,

I built and run the code locally, the issue is not occurred.

Image

The reason is PyTorch official VS2022 build environment issue, as I commented in #145702 (comment)
I still suggest to switch back Windows build to VS2019.

@malfet
Copy link
Contributor

malfet commented Apr 30, 2025

I think VS2022 upgrade is a misnomer, i.e. in at least in CI jobs are still running with VS2019

Few things to investigate:

  • As nightly binaries are available for download, can someone volunteer to bisect at what point it started to happen?
  • Could it be because we never setup architecture flags correctly, and builder has been upgraded to more modern/capable machine, so it compiles for AVX512 by default
  • Could it be that some of the dependencies update just leak flag into CXXFLAGS (can be checked by comparing CMAKECache/build logs for last successfull and first failing build)
  • Do we know if AVX512 instruction are invoked while running semaphore or something? (We had similar issue on ARM platform, that leaked more modern instructions, because there were used in std::condition_variable::wait or something of the nature

@xuhancn
Copy link
Collaborator

xuhancn commented May 1, 2025

Hi @malfet , @atalman

Actually, I'm tracking for this issue for long time. Let me go though this issue by time line.

  1. In early stage, we both have VS2019 and VS2022 build environment. (before Jan 22, Windows builds with VS2022 #145319)
    a. CPU and CUDA wheel were built by VS2019.
    b. XPU wheel was built by VS2022.

  2. Intel XPU team found this issue and assgin to me on Dec 17, 2024. We have an issue as here: LNL Windows python error Illegal instruction intel/torch-xpu-ops#1173
    I debugged this issue and found:
    a. I can't local reproduce in my VS2022 environment.
    b. It's only occurred in XPU wheel.
    c. I made a PR to switch CPU wheel compiler to VS2022: [don't merge] build cpu via vs2022 (test diff) #143826 , and the issue occurred(Dec 25, 2024).

  3. Based on item 2, I think PyTorch official build environment is cased this issue, and I create the issue to report it as here: PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU #145702

  4. I found a PR (Windows builds with VS2022 #145319) switched VS2022 to build CPU and CUDA wheel, and I debugged nightly build and updated issue: PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU #145702 (comment) (Feb 20)
    a. For Windows builds with VS2022 #145319 it is no comments and it is easy to be merged.
    b. VS2019 code was cleaned up on Jan 29: Cleanup VS 2019 refs in pytorch #145863

This issue is not detected by CI is due to its crashed by AVX512 instruction is genarated in AVX2 path, it is only occurred on client CPU.
For this issue @atalman , could you please switch release/2.7 to VS2019 and validated its wheel on a client machine? We need to confirm the root cause to discuss next step.

@malfet
Copy link
Contributor

malfet commented May 1, 2025

@xuhancn can you just propose the PR on trunk that rolls back to VS2019 and validate the binary and then we'll cherry-pick it to 2.7.1? (Also, do you know if it's possible to reserve non-AVX512 capable machine on AWS? Or somehow simulate the failure on a more modern hardware)

@atalman
Copy link
Contributor

atalman commented May 1, 2025

@xuhancn yes we can open a PR to switch to VS 2019 and regenerate the binary for testing

@xuhancn
Copy link
Collaborator

xuhancn commented May 1, 2025

I draft a PR to roll back to VS2019: #152613
Let's waiting for the CI. @malfet @atalman

@atalman
Copy link
Contributor

atalman commented May 1, 2025

Hi @xuhancn you should be able to download the built binary from #152613 and test it

@xuhancn
Copy link
Collaborator

xuhancn commented May 1, 2025

Hi @xuhancn you should be able to download the built binary from #152613 and test it

Yes, I will download the wheel and validate it.

@xuhancn
Copy link
Collaborator

xuhancn commented May 1, 2025

Hi @atalman

I have downloaded wheel-py3_9-cpu.zip from https://github.com/pytorch/pytorch/actions/runs/14779507170?pr=152613 , which sha256 is sha256:058fd377bcbe2928cf8162261cfc5d92bdd724f45fcd43ed42c3d5c9aa80b00c.

And then, I tested its wheel on my Intel 12th Core CPU. The test result is pass.

Image

CC: @malfet

@ericspod
Copy link
Author

ericspod commented May 1, 2025

Hi @xuhancn @atalman, I have installed the nightly wheel for Python 3.12 (sha sha256:f1fb15bba2b1a3bbfdc99e01830df145c27037bfd4c52474a1bc2cb1745ca452) into the environment which sees the illegal instruction, and this is now not occurring. As what's seen above, the change here seems to have resolved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: cpu CPU specific problem (e.g., perf, algorithm) module: regression It used to work, and now it doesn't module: windows Windows support for PyTorch triage review
Projects
None yet
Development

No branches or pull requests

5 participants