Always insert set_host_dirty(), even if there is no GPU target. #8711

mcourteaux · 2025-08-07T17:32:07Z

Always insert set_host_dirty(), even if there is no GPU target. This allows mixing CPU-only and GPU-compatible pipelines.

This is done by simply calling the inject buffer copies lowering pass, which normally does this in the case of GPU targets. The nice thing here is that it doesn't do anything besides adding those calls to set_host_dirty().

cc @abadams

mcourteaux · 2025-08-07T20:30:30Z

These simd_op_checks failing without error output is an annoying mystery...

mcourteaux · 2025-08-29T07:42:54Z

@vksnk Can you think about this in "preparation" of the dev meeting today? Andrew thought you might want to have something to say about this, regarding internal usage at Google.

vksnk · 2025-08-29T18:52:30Z

I can't think of many internal pipelines which currently rely on host_dirty flag, so I think this should be fine.

mcourteaux · 2025-08-30T14:26:16Z

Looking at the failure in the make-based builds.

Unconditionally true:

It seems that the generator test define_extern_opencl is an OpenCL-only test. It gets skipped if there is no OpenCL:

Halide/test/generator/define_extern_opencl_aottest.cpp

Lines 17 to 27 in 0653b82

    
           #elif !defined(TEST_OPENCL) 
        
           // Avoid link errors 
        
           extern "C" int32_t gpu_input(halide_buffer_t *input, halide_buffer_t *output) { 
        
               return 0; 
        
           } 
        
           int main(int argc, char **argv) { 
        
               printf("[SKIP] Test requires OpenCL.\n"); 
        
               return 0; 
        
           }

It manually defines an external Func that is computed on an OpenCL device:

Halide/test/generator/define_extern_opencl_generator.cpp

Line 19 in 0653b82

gpu_input.define_extern("gpu_input", {arg}, Halide::type_of<int32_t>(), 1, NameMangling::Default, Halide::DeviceAPI::OpenCL);
The Makefile unconditionally executes the generator and links the test, even if the HL_TARGET is host (n.b. without -opencl).

The generator inserts this code at the end of the pipeline:

bool _138 = !(_17);
if (!_138)
{
 int32_t _139 = halide_error_device_dirty_with_no_device_support(_ucon, "Output buffer output");
 return _139;
}

The actual test code manually copies back:

Halide/test/generator/define_extern_opencl_aottest.cpp

Lines 195 to 196 in 0653b82

define_extern_opencl(input, output);

output.copy_to_host();

On main:

HL_TARGET=host and the generator generates a pipeline that has no buffer_to_host copies. No calls to any of the device interfaces are made.
The test is skipped because TEST_OPENCL was not defined.

On this PR branch:

HL_TARGET=host and now the generator inserts buffer-to-host copies (exactly the purpose of this PR), because the extern Func touched the output buffer on an OpenCL device.
The generated runtime does not include any of the OpenCL runtime functions, because HL_TARGET=host. This explains the linker error.

I conclude that this test is broken or should not be run for host, because the generated pipeline that includes the copy back does a correct job, given that the extern OpenCL func is called.

The CMake build seems to skip this test, based on the [SKIP] printing explained above:

        Start 633: generator_aot_define_extern_opencl
552/698 Test #633: generator_aot_define_extern_opencl .........................***Skipped   0.00 sec
        Start 634: generator_aotcpp_define_extern_opencl
553/698 Test #634: generator_aotcpp_define_extern_opencl ......................***Skipped   0.00 sec

However, I don't understand why the CMake build does not attempt to compile and link the generated pipeline.

mcourteaux · 2025-08-30T14:42:59Z

@alexreinking Do you have a proposal on how to properly disable the output of the generator being compiled in the Make-based system?

alexreinking · 2025-08-30T14:44:14Z

@abadams is the person to talk to about Make maintenance

abadams · 2025-09-02T17:52:54Z

There should be no situation where a pipeline compiled with HL_TARGET=host inserts a dependence on a GPU API, either at link time or run time. This is why I'm a bit dubious of the solution of just always running inject_host_dev_buffer_copies. It may inject unwanted copies. This PR should be constrained to only setting host_dirty flags.

However, the test does something that should be an error: it declares that an extern stage leaves a buffer with a dirty OpenCL allocation without having OpenCL support in the target. It wouldn't know how to do the copy-back. I'll open a PR that makes it an error.

abadams · 2025-09-02T17:55:05Z

See #8794

abadams · 2025-09-02T17:57:41Z

An alternative to running inject_host_dev_copies would be to add code in AddImageChecks.cpp near where it checks device_dirty (line 650) to set host_dirty on outputs.

abadams · 2025-09-02T18:46:40Z

I figured out why it didn't fail for cmake too. Not important but I figured I'd write it down in case it becomes relevant in future.

This is failing for the path where we test generators via the cpp backend. When the Makefile runs one of these, after generating the c++ source and a runtime it does roughly:

c++ test.cpp pipeline.cpp runtime.a

Whereas the cmake build goes via a static library

c++ -c pipeline.cpp -o pipeline.o
ar qc pipeline.a pipeline.o
c++ test.cpp pipeline.a runtime.a

When you link a static library, it only gets the needed symbols. In this case halide_opencl_device_interface isn't needed, because the test never calls the pipeline in the skip case, so the whole pipeline isn't linked. In the Makefile build it's presented as an extra source file. If you do that (or present it as an extra object file), everything directly called counts as a needed symbol, so you get a linker error.

So cmake dodged the problem by not linking the problematic code, because it was never called anyway.

mcourteaux · 2025-10-23T16:40:50Z

I give up... I'll wait with making more contributions until this LLVM ratrace is over. 😭

…allows mixing CPU-only and GPU-compatible pipelines.

mcourteaux · 2025-10-24T15:25:17Z

@alexreinking Can you trigger workflow again here too for make-macos?

abadams · 2025-10-24T18:34:04Z

All green!

mcourteaux requested review from abadams and derek-gerstmann August 7, 2025 17:32

mcourteaux added the enhancement New user-visible features or improvements to existing features. label Aug 7, 2025

mcourteaux requested a review from halidebuildbots August 7, 2025 17:33

mcourteaux changed the title ~~Always insert set_host_dirty(), even is there is no GPU target.~~ Always insert set_host_dirty(), even if there is no GPU target. Aug 7, 2025

mcourteaux added the dev_meeting Topic to be discussed at the next dev meeting label Aug 7, 2025

mcourteaux requested a review from vksnk August 7, 2025 17:45

mcourteaux added release_notes For changes that may warrant a note in README for official releases. and removed dev_meeting Topic to be discussed at the next dev meeting labels Aug 29, 2025

mcourteaux force-pushed the always-mark-host-dirty branch from ccb1490 to 2718a2f Compare August 29, 2025 21:58

alexreinking mentioned this pull request Sep 3, 2025

Suspicious CMake code for generator tests. #8790

Closed

mcourteaux force-pushed the always-mark-host-dirty branch 2 times, most recently from e83e021 to 61fa034 Compare October 10, 2025 21:36

mcourteaux force-pushed the always-mark-host-dirty branch from 61fa034 to f461ff3 Compare October 23, 2025 16:09

mcourteaux added 2 commits October 23, 2025 18:59

Remove UnsafeFPMath for LLVM 21+

69efe00

Always insert set_host_dirty(), even is there is no GPU target. This …

e54a0f2

…allows mixing CPU-only and GPU-compatible pipelines.

mcourteaux force-pushed the always-mark-host-dirty branch from f461ff3 to e54a0f2 Compare October 24, 2025 09:16

abadams approved these changes Oct 24, 2025

View reviewed changes

abadams merged commit fe0542f into halide:main Oct 24, 2025
19 checks passed

Uh oh!

Always insert set_host_dirty(), even if there is no GPU target. #8711

Always insert set_host_dirty(), even if there is no GPU target. #8711

Uh oh!

Conversation

mcourteaux commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcourteaux commented Aug 7, 2025

Uh oh!

mcourteaux commented Aug 29, 2025

Uh oh!

vksnk commented Aug 29, 2025

Uh oh!

mcourteaux commented Aug 30, 2025

Uh oh!

mcourteaux commented Aug 30, 2025

Uh oh!

alexreinking commented Aug 30, 2025

Uh oh!

abadams commented Sep 2, 2025

Uh oh!

abadams commented Sep 2, 2025

Uh oh!

abadams commented Sep 2, 2025

Uh oh!

abadams commented Sep 2, 2025

Uh oh!

mcourteaux commented Oct 23, 2025

Uh oh!

mcourteaux commented Oct 24, 2025

Uh oh!

abadams commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mcourteaux commented Aug 7, 2025 •

edited

Loading