You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reduction tensor, T3, is chosen as the reference for the reduction part, but it doesn't have any ID that is connected with iS6 or iS2, so it cannot be used as the reference for the whole fusion, yet the scheduler accepted it without segmentation, yielding:
As expected, because of the dangling IDs, T6 has an unscheduled ID, iS15, and because of that, its execution resutls in:
C++ exception with description " INTERNAL ASSERT FAILED at "/home/nmaruyama/nvfuser/debug2/csrc/runtime/compiled_kernel.cpp":1320, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Allocations must be based on constant integers for local memory. However, found: T6_l_float[iS61{( ceilDiv(( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[1] ), 2) ), blockDim.x) )}, ithreadIdx.x60{blockDim.x}_p, iUS62{1}, iV58{2}, iS81{( (( (( getMetaData(T1) )).logical_size ))[0] )}] ca_pos( 3 ), T8_l_float[iS55{( ceilDiv(( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[1] ), 2) ), blockDim.x) )}, ithreadIdx.x54{blockDim.x}_p, iUS56{1}, iS52{2}, iS83{( (( (( getMetaData(T1) )).logical_size ))[0] )}] ca_pos( 3 ) produce_pos( 3 ), T6_l_float[iS61{( ceilDiv(( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[1] ), 2) ), blockDim.x) )}, ithreadIdx.x60{blockDim.x}_p, iUS62{1}, iV58{2}, iS81{( (( (( getMetaData(T1) )).logical_size ))[0] )}] ca_pos( 3 ), T8_l_float[iS55{( ceilDiv(( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[1] ), 2) ), blockDim.x) )}, ithreadIdx.x54{blockDim.x}_p, iUS56{1}, iS52{2}, iS83{( (( (( getMetaData(T1) )).logical_size ))[0] )}] ca_pos( 3 ) produce_pos( 3 ), have dynamic allocations but are placed in local memory.
The problem seems to be the canScheduleCompileTime of the reduction scheduler. It should detect such dangling IDs and reject fusions if any. It does have some related checks like hasPostReductionBCast, but that's not sufficient.
The text was updated successfully, but these errors were encountered:
This fusion is scheduled by the reduction scheduler without segmentation, which should not be.
The reduction tensor,
T3
, is chosen as the reference for the reduction part, but it doesn't have any ID that is connected withiS6
oriS2
, so it cannot be used as the reference for the whole fusion, yet the scheduler accepted it without segmentation, yielding:As expected, because of the dangling IDs,
T6
has an unscheduled ID,iS15
, and because of that, its execution resutls in:The problem seems to be the
canScheduleCompileTime
of the reduction scheduler. It should detect such dangling IDs and reject fusions if any. It does have some related checks likehasPostReductionBCast
, but that's not sufficient.The text was updated successfully, but these errors were encountered: