Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfactor now causes out of bounds error. #8600

Open
vksnk opened this issue Mar 21, 2025 · 7 comments
Open

rfactor now causes out of bounds error. #8600

vksnk opened this issue Mar 21, 2025 · 7 comments

Comments

@vksnk
Copy link
Member

vksnk commented Mar 21, 2025

After recent rfactor changes there are several generators which have started to fail with Input buffer _buffer_name_ is accessed at _number_, which is beyond the max (_another_number_) in dimension 1, they all seem to have schedules which follow the pattern:

.split(r.x, rx, tile_size_x)
.split(r.y, ry, tile_size_y)
.rfactor({{rx, x}, {ry, y}})

This now seems to fail with the above error if the input dimension extent is not divisible by the corresponding tile size. I tried to use various tail strategies, but it doens't seem to have any effect on it. I also tried to use specialize and only rfactor if the extents are divisible by tile size, but that also didn't work. I can try to provide a more isolated test to demonstrate it, but I wanted to check first if it's even supposed to work or is this error is legit?

@abadams
Copy link
Member

abadams commented Mar 21, 2025

I saw a similar issue in production, and the cause was that rfactor used to be silently (and incorrectly) dropping the if statement from GuardWithIf if you rfactor the child of a split. So unfortunately this is most likely due to a fixed bug. Not sure what the link is between this and overconservative bounds inference though. In my case it was an atomic vectorize failing to do its thing.

@vksnk
Copy link
Member Author

vksnk commented Mar 21, 2025

I see, this makes sense. I am a bit surprised that specialize trick didn't work, but it seems that it doesn't really help if I put rfactor inside of the specialize. The only thing which worked was to add boundary conditions to the input, is there any alternative way to make it work?

@abadams
Copy link
Member

abadams commented Mar 21, 2025

If you have a small repro I could take a look and see

@vksnk
Copy link
Member Author

vksnk commented Mar 21, 2025

I think this would be the simplest example to demonstrate the problem:

int rfactor_split() {
    Var x("x"), y("y");
    RDom r(0, 10, 0, 10);

    // Create an input with random values
    Buffer<uint8_t> input(10, 10, "input");
    for (int y = 0; y < 10; ++y) {
        for (int x = 0; x < 10; ++x) {
            input(x, y) = (rand() % 256);
        }
    }


    Func f;

    f() = 0;
    f() += input(r.x, r.y);
    RVar rxo, rxi, ryo, ryi;
    f.update()
        .tile(r.x, r.y, rxo, ryo, rxi, ryi, 4, 4)
        .rfactor({{rxi, x}, {ryi, y}})
        .compute_root()
        .update()
        .vectorize(x)
        .unroll(y);

    Buffer<int> im = f.realize();

    return 0;
}

@vksnk
Copy link
Member Author

vksnk commented Mar 21, 2025

Actually, don't even need to vectorize or unroll:

int rfactor_split() {
    Var x("x"), y("y");
    RDom r(0, 10, 0, 10);

    // Create an input with random values
    Buffer<uint8_t> input(10, 10, "input");
    for (int y = 0; y < 10; ++y) {
        for (int x = 0; x < 10; ++x) {
            input(x, y) = (rand() % 256);
        }
    }


    Func f;

    f() = 0;
    f() += input(r.x, r.y);
    RVar rxo, rxi, ryo, ryi;
    f.update()
        .tile(r.x, r.y, rxo, ryo, rxi, ryi, 4, 4)
        .rfactor({{rxi, x}, {ryi, y}})
        .compute_root();

    Buffer<int> im = f.realize();

    return 0;
}

@abadams
Copy link
Member

abadams commented Mar 21, 2025

Yeah, the if statement is now (correctly) there, and allocation bounds inference isn't understanding it. Without the rfactor we get a loop that looks like:

  for (f0.s1.r4$y.r7, f0.s1.r4$y.r7.loop_min, f0.s1.r4$y.r7.loop_extent) {
    let f0.s1.r4$y.r8.base = (f0.s1.r4$y.r7*4) + f0.s1.r4$y.loop_min
    for (f0.s1.r4$x.r5, f0.s1.r4$x.r5.loop_min, f0.s1.r4$x.r5.loop_extent) {
     let f0.s1.r4$x.r6.base = (f0.s1.r4$x.r5*4) + f0.s1.r4$x.loop_min
     for (f0.s1.r4$y.r8, f0.s1.r4$y.r8.loop_min, f0.s1.r4$y.r8.loop_extent) {
      let f0.s1.r4$y = f0.s1.r4$y.r8.base + f0.s1.r4$y.r8
      if ((uint1)likely(f0.s1.r4$y <= f0.s1.r4$y.loop_max)) {
       let f0.s1.r4$y.guarded = promise_clamped(f0.s1.r4$y, f0.s1.r4$y, f0.s1.r4$y.loop_max)
       for (f0.s1.r4$x.r6, f0.s1.r4$x.r6.loop_min, f0.s1.r4$x.r6.loop_extent) {
        let f0.s1.r4$x = f0.s1.r4$x.r6.base + f0.s1.r4$x.r6
        if ((uint1)likely(f0.s1.r4$x <= f0.s1.r4$x.loop_max)) {
         let f0.s1.r4$x.guarded = promise_clamped(f0.s1.r4$x, f0.s1.r4$x, f0.s1.r4$x.loop_max)
         f0() = f0() + int32((uint8)input(f0.s1.r4$x.guarded, f0.s1.r4$y.guarded))
        }
       }
      }
     }
    }
   }
  }

The .guarded symbols are specifically to assist allocation bounds inference.

With the rfactor we have:

    for (f0_intm.s1.__outermost, f0_intm.s1.__outermost.loop_min, f0_intm.s1.__outermost.loop_extent) {
     for (f0_intm.s1.r7, f0_intm.s1.r7.loop_min, f0_intm.s1.r7.loop_extent) {
      for (f0_intm.s1.r5, f0_intm.s1.r5.loop_min, f0_intm.s1.r5.loop_extent) {
       for (f0_intm.s1.y, f0_intm.s1.y.loop_min, f0_intm.s1.y.loop_extent) {
        if ((uint1)likely((uint1)likely(((f0_intm.s1.r7*4) + f0_intm.s1.y) <= 9))) {
         for (f0_intm.s1.x, f0_intm.s1.x.loop_min, f0_intm.s1.x.loop_extent) {
          if ((uint1)likely((uint1)likely(((f0_intm.s1.r5*4) + f0_intm.s1.x) <= 9))) {
           f0_intm(f0_intm.s1.x, f0_intm.s1.y) = f0_intm(f0_intm.s1.x, f0_intm.s1.y) + int32((uint8)input(((f0_intm.s1.r5*4) + 0) + f0_intm.s1.x, ((f0_intm.s1.r7*4) + 0) + f0_intm.s1.y))
          }
         }
        }
       }
      }
     }
    }
   }

No .guarded symbols, so allocation bounds inference doesn't understand it. We're probably not getting them because that IfThenElse doesn't come from a GuardWithIf split on the intermediate Func. It's inherited from the original Func's split instead. I think it might be attached as a where clause on the intermediate Func's RDom (Func.cpp:741). @alexreinking thoughts on how we can help out bounds inference here?

Oh, looks like it should be handling substitutions at Func.cpp:743 instead of dropping them.

@abadams
Copy link
Member

abadams commented Mar 21, 2025

To be clear, while the previous code may have been reading out of bounds, I think this overzealous assert counts as a Halide bug, and not code you need to change internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants