-
Notifications
You must be signed in to change notification settings - Fork 800
[L0v2] fix unbounded memory growth of queue's submitted kernels #20847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl-rel-6_2
Are you sure you want to change the base?
[L0v2] fix unbounded memory growth of queue's submitted kernels #20847
Conversation
|
I've created a PR against 6.2, since this functionality changed a little in the latest version, and will require a different patch. |
nrspruit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
753bc9e to
f99e1cd
Compare
f99e1cd to
57aef87
Compare
57aef87 to
156235d
Compare
L0v2 avoids internally tracking each kernel submission through an event for lifetime management. Instead, when a kernel is submitted to the queue, its handle is added to a vector, to be removed at the next queue synchronization point, urQueueFinish(). This is a much more efficient way of handling kernel tracking, since it avoids taking and storing an event. However, if the application never synchronizes the queue, this vector of submitted kernels will grow unbounded. This patch forcibly synchronizes the queue once the submitted kernels vector reaches a threshold.
| ur_kernel_handle_t hKernel) { | ||
| locked<ur_command_list_manager> &commandList, ur_kernel_handle_t hKernel) { | ||
| if (submittedKernels.size() > MAX_QUEUE_SUBMITTED_KERNELS) { | ||
| synchronize(commandList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is very low chance that we will have 1000 unique kernels, most likely we have duplicates here.
RIght now this will always synchronize every 1k submits which is undesired.
Can you compact the vector by finding duplicates and releasing those?
You just need one kernel instance in container to hold the object all additional ones are not neeeded.
Also what I would consider is not adding kernel to the vector if it is already there.
L0v2 avoids internally tracking each kernel submission through an event for lifetime management. Instead, when a kernel is submitted to the queue, its handle is added to a vector, to be removed at the next queue synchronization point, urQueueFinish(). This is a much more efficient way of handling kernel tracking, since it avoids taking and storing an event. However, if the application never synchronizes the queue, this vector of submitted kernels will grow unbounded.
This patch forcibly synchronizes the queue once the submitted kernels vector reaches a threshold.