Skip to content

Deadlock when changing GPU frequency during upload #654

Open
@russelltg

Description

@russelltg

Pretty minor bug here but a bug nonetheless--we have an OpenCL-accelerated application using OpenCV, and I ran sudo intel_gpu_frequency -m while it was running and it locked up inside the OpenCL driver. It doesn't happen every time so I theorize that it only deadlocks if during an upload.

Here's the stack:

    frame #0: 0x00007fffdb4e0cab libc.so.6`__sched_yield at syscall-template.S:120
    frame #1: 0x00007ffeed55410e libigdrcl.so`NEO::CommandStreamReceiver::baseWaitFunction(unsigned int volatile*, NEO::WaitParams const&, unsigned int) at gthr-default.h:693:32
    frame #2: 0x00007ffeed470330 libigdrcl.so`NEO::CommandStreamReceiverHw<NEO::TGLLPFamily>::waitForTaskCountWithKmdNotifyFallback(unsigned int, unsigned long, bool, NEO::QueueThrottle) at command_stream_receiver_hw_base.inl:861:47
    frame #3: 0x00007ffeed0c5186 libigdrcl.so`NEO::CommandQueue::waitUntilComplete(unsigned int, NEO::Range<NEO::CopyEngineState>, unsigned long, bool, bool, bool) at command_queue.cpp:259:91
    frame #4: 0x00007ffeed0c8c33 libigdrcl.so`NEO::CommandQueue::waitForAllEngines(bool, NEO::PrintfHandler*, bool) at command_queue.cpp:1044:46
    frame #5: 0x00007ffeed2526a1 libigdrcl.so`NEO::CommandQueueHw<NEO::TGLLPFamily>::finish() at command_queue.h:218:39
    frame #6: 0x00007ffeed0ca1db libigdrcl.so`NEO::CommandQueue::cpuDataTransferHandler(NEO::TransferProperties&, NEO::EventsRequest&, int&) at cpu_data_transfer_handler.cpp:97:23
    frame #7: 0x00007ffeed253902 libigdrcl.so`NEO::CommandQueueHw<NEO::TGLLPFamily>::enqueueReadWriteBufferOnCpuWithMemoryTransfer(unsigned int, NEO::Buffer*, unsigned long, unsigned long, void*, unsigned int, _cl_event* const*, _cl_event**) at command_queue_hw_base.inl:64:27
    frame #8: 0x00007ffeed2ac73d libigdrcl.so`NEO::CommandQueueHw<NEO::TGLLPFamily>::enqueueReadBuffer(NEO::Buffer*, unsigned int, unsigned long, unsigned long, void*, NEO::GraphicsAllocation*, unsigned int, _cl_event* const*, _cl_event**) at enqueue_read_buffer.h:62:65
    frame #9: 0x00007ffeed0990e2 libigdrcl.so`clEnqueueReadBuffer at api.cpp:2309:50
    frame #10: 0x00007fffecbe85d4 libopencv_core4d.so.407`cv::ocl::OpenCLAllocator::download(this=0x000055555fc62140, u=0x00007ffc5415b010, dstptr=0x00007ffc72750040, dims=2, sz=0x00007ffcab27f660, srcofs=0x00007ffcab27f560, srcstep=0x00007ffcab27fc40, dststep=0x00007ffcab27f3d0) const at ocl.cpp:6194:17
    frame #11: 0x00007fffecc9639d libopencv_core4d.so.407`cv::UMat::copyTo(this=0x00007ffcab27fc00, _dst=0x00007ffcab27f9d0) const at umatrix.cpp:1184:23

Setup:
Ubuntu 22.04
intel-opencl-icd=22.14.22890-1
Kernel: 6.3.8-arch1-1 (ubuntu is running in a docker container, but that shouldn't affect any of this)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions