Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failed with Error replaying transforms in contiguous ID checker, expected iS10{9} to be in the active ID set. #3919

Open
xwang233 opened this issue Feb 19, 2025 · 2 comments · May be fixed by #3926
Assignees

Comments

@xwang233
Copy link
Collaborator

xwang233 commented Feb 19, 2025

Repro:

import torch
from nvfuser import FusionDefinition, DataType

def nvfuser_fusion_id5(fd : FusionDefinition) -> None :
    T0 = fd.define_tensor(shape=[1, 1], contiguity=[None, None], dtype=DataType.Int, is_cpu=False, stride_order=[1, 0])
    S1 = fd.define_scalar(-2, dtype=DataType.Int)
    T7 = fd.ops.pad(T0, [0, 2, 0, 2], S1)
    fd.add_output(T7)

with FusionDefinition() as fd:
    nvfuser_fusion_id5(fd)

inputs = [
    torch.testing.make_tensor((1, 1), dtype=torch.int64, device='cuda:0'),
]
fd.execute(inputs)

Stacktrace:

Error replaying transforms in contiguous ID checker, expected iS10{9} to be in the active ID set.
Exception raised from checkExclusivelyConsumesAllocs at /opt/pytorch/nvfuser/csrc/contiguity.cpp:51 (most recent call first):
frame #0: nvfuser::nvfCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x103 (0x7f3d6bb6010f in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #1: nvfuser::nvfErrorFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x62 (0x7f3d6bf9fea2 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x3dd516 (0x7f3d6bdc0516 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-312-x86_64-linux-gnu.so)

Env:

  • pjnl-20250218
  • H100

This also caused thunder test to fail

pytest -vsx thunder/tests/test_ops.py -k test_core_vs_jax_consistency_pad_nvfuser_cuda_thunder
@naoyam
Copy link
Collaborator

naoyam commented Feb 19, 2025

Thanks @xwang233. Do you know if it's a new test that had been working before or is it a new test?

@xwang233
Copy link
Collaborator Author

The thunder test failure looks new and we don't have enough data for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants