Skip to content

Conversation

@mcourteaux
Copy link
Contributor

@mcourteaux mcourteaux commented Aug 7, 2025

Always insert set_host_dirty(), even if there is no GPU target. This allows mixing CPU-only and GPU-compatible pipelines.

This is done by simply calling the inject buffer copies lowering pass, which normally does this in the case of GPU targets. The nice thing here is that it doesn't do anything besides adding those calls to set_host_dirty().

cc @abadams

@mcourteaux mcourteaux added the enhancement New user-visible features or improvements to existing features. label Aug 7, 2025
@mcourteaux mcourteaux changed the title Always insert set_host_dirty(), even is there is no GPU target. Always insert set_host_dirty(), even if there is no GPU target. Aug 7, 2025
@mcourteaux mcourteaux added the dev_meeting Topic to be discussed at the next dev meeting label Aug 7, 2025
@mcourteaux mcourteaux requested a review from vksnk August 7, 2025 17:45
@mcourteaux
Copy link
Contributor Author

These simd_op_checks failing without error output is an annoying mystery...

@mcourteaux
Copy link
Contributor Author

@vksnk Can you think about this in "preparation" of the dev meeting today? Andrew thought you might want to have something to say about this, regarding internal usage at Google.

@vksnk
Copy link
Member

vksnk commented Aug 29, 2025

I can't think of many internal pipelines which currently rely on host_dirty flag, so I think this should be fine.

@mcourteaux mcourteaux added release_notes For changes that may warrant a note in README for official releases. and removed dev_meeting Topic to be discussed at the next dev meeting labels Aug 29, 2025
@mcourteaux mcourteaux force-pushed the always-mark-host-dirty branch from ccb1490 to 2718a2f Compare August 29, 2025 21:58
@mcourteaux
Copy link
Contributor Author

Looking at the failure in the make-based builds.

Unconditionally true:

  • It seems that the generator test define_extern_opencl is an OpenCL-only test. It gets skipped if there is no OpenCL:
    #elif !defined(TEST_OPENCL)
    // Avoid link errors
    extern "C" int32_t gpu_input(halide_buffer_t *input, halide_buffer_t *output) {
    return 0;
    }
    int main(int argc, char **argv) {
    printf("[SKIP] Test requires OpenCL.\n");
    return 0;
    }
  • It manually defines an external Func that is computed on an OpenCL device:
    gpu_input.define_extern("gpu_input", {arg}, Halide::type_of<int32_t>(), 1, NameMangling::Default, Halide::DeviceAPI::OpenCL);
  • The Makefile unconditionally executes the generator and links the test, even if the HL_TARGET is host (n.b. without -opencl).
  • The generator inserts this code at the end of the pipeline:
    bool _138 = !(_17);
    if (!_138)
    {
     int32_t _139 = halide_error_device_dirty_with_no_device_support(_ucon, "Output buffer output");
     return _139;
    }
  • The actual test code manually copies back:
    define_extern_opencl(input, output);
    output.copy_to_host();

On main:

  • HL_TARGET=host and the generator generates a pipeline that has no buffer_to_host copies. No calls to any of the device interfaces are made.
  • The test is skipped because TEST_OPENCL was not defined.

On this PR branch:

  • HL_TARGET=host and now the generator inserts buffer-to-host copies (exactly the purpose of this PR), because the extern Func touched the output buffer on an OpenCL device.
  • The generated runtime does not include any of the OpenCL runtime functions, because HL_TARGET=host. This explains the linker error.

I conclude that this test is broken or should not be run for host, because the generated pipeline that includes the copy back does a correct job, given that the extern OpenCL func is called.

The CMake build seems to skip this test, based on the [SKIP] printing explained above:

        Start 633: generator_aot_define_extern_opencl
552/698 Test #633: generator_aot_define_extern_opencl .........................***Skipped   0.00 sec
        Start 634: generator_aotcpp_define_extern_opencl
553/698 Test #634: generator_aotcpp_define_extern_opencl ......................***Skipped   0.00 sec

However, I don't understand why the CMake build does not attempt to compile and link the generated pipeline.

@mcourteaux
Copy link
Contributor Author

@alexreinking Do you have a proposal on how to properly disable the output of the generator being compiled in the Make-based system?

@alexreinking
Copy link
Member

@abadams is the person to talk to about Make maintenance

@abadams
Copy link
Member

abadams commented Sep 2, 2025

There should be no situation where a pipeline compiled with HL_TARGET=host inserts a dependence on a GPU API, either at link time or run time. This is why I'm a bit dubious of the solution of just always running inject_host_dev_buffer_copies. It may inject unwanted copies. This PR should be constrained to only setting host_dirty flags.

However, the test does something that should be an error: it declares that an extern stage leaves a buffer with a dirty OpenCL allocation without having OpenCL support in the target. It wouldn't know how to do the copy-back. I'll open a PR that makes it an error.

@abadams
Copy link
Member

abadams commented Sep 2, 2025

See #8794

@abadams
Copy link
Member

abadams commented Sep 2, 2025

An alternative to running inject_host_dev_copies would be to add code in AddImageChecks.cpp near where it checks device_dirty (line 650) to set host_dirty on outputs.

@abadams
Copy link
Member

abadams commented Sep 2, 2025

I figured out why it didn't fail for cmake too. Not important but I figured I'd write it down in case it becomes relevant in future.

This is failing for the path where we test generators via the cpp backend. When the Makefile runs one of these, after generating the c++ source and a runtime it does roughly:

c++ test.cpp pipeline.cpp runtime.a

Whereas the cmake build goes via a static library

c++ -c pipeline.cpp -o pipeline.o
ar qc pipeline.a pipeline.o
c++ test.cpp pipeline.a runtime.a

When you link a static library, it only gets the needed symbols. In this case halide_opencl_device_interface isn't needed, because the test never calls the pipeline in the skip case, so the whole pipeline isn't linked. In the Makefile build it's presented as an extra source file. If you do that (or present it as an extra object file), everything directly called counts as a needed symbol, so you get a linker error.

So cmake dodged the problem by not linking the problematic code, because it was never called anyway.

@mcourteaux mcourteaux force-pushed the always-mark-host-dirty branch 2 times, most recently from e83e021 to 61fa034 Compare October 10, 2025 21:36
@mcourteaux mcourteaux force-pushed the always-mark-host-dirty branch from 61fa034 to f461ff3 Compare October 23, 2025 16:09
@mcourteaux
Copy link
Contributor Author

I give up... I'll wait with making more contributions until this LLVM ratrace is over. 😭

@mcourteaux mcourteaux force-pushed the always-mark-host-dirty branch from f461ff3 to e54a0f2 Compare October 24, 2025 09:16
@mcourteaux
Copy link
Contributor Author

@alexreinking Can you trigger workflow again here too for make-macos?

@abadams
Copy link
Member

abadams commented Oct 24, 2025

All green!

@abadams abadams merged commit fe0542f into halide:main Oct 24, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New user-visible features or improvements to existing features. release_notes For changes that may warrant a note in README for official releases.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants