-
Notifications
You must be signed in to change notification settings - Fork 190
Introduce a CancellationToken for cancelling specific computations
#1007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
CodSpeed Performance ReportMerging #1007 will degrade performances by 8.07%Comparing Summary
Benchmarks breakdown
|
|
Looks like allocator noise from the new arc alloc |
|
Interesting. Is the idea to cancel a single thread rather than all threads? Won't this cancell all threads that currently block on any query executing on the thread being cancelled? |
Yes, the thought is that this could allow implementing client side request cancellation for LSP requests for example.
That is the thing I will have to investigate. We unwind with the payload, we don't panic so I believe this does not set the panicking state of the thread actually. So this might actually just work (though I doubt it). |
Oh, so other threads would reclaim and re-execute the queries then. Interesting. |
|
That would be the ideal I think |
db715b7 to
47b7a7d
Compare
|
I was wrong, unwinding does set the panicking flag after all (which probably makes more sense ...) hmm |
f97346e to
de32cab
Compare
ba0f832 to
2318de4
Compare
|
Would mind explaining the approach in the pr summary. I'm mainly interested in how it works if other threads are blocked on a query that gets cancelled (including if they participate in a cycle) |
d2edc9b to
e0ad0ff
Compare
|
Turns out I forgot to actually implemen the relevant retry part before 🤦 I did add the bool return type but didn't fix up the usages. To my surprise my test I added worked either way, which I think makes sense? Even if we don't retry if we don't propagate the panic we will just notice the value still missing in fetch cold and recompute again. Though I think this will cause two databases to compute the same query if they both were blocked. Should probably add a test for that. Likewise I am not yet sure how to handle this for fix points yet (and need to cook up a test there as well) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to find time to think through what this means for fixpoint. What makes fixpoint different is that it creates and stores intermediate results and there's no way of reseting the database to its previous state. The other part that makes fixpoint "strange" is that "which query is called first" is meaningful because cycles are hit at different points depending on the entry query.
We do have a similar problem with panics but, after too much time spent debuging hangs, I decided to simply immediately panic if we see that a query participating in a cycle panicked before. That circumvents the entire: how do we clean up the intermediate state problem.
src/runtime.rs
Outdated
| /// | ||
| /// Returns `true` if the computation was successful, and `false` if the other thread was cancelled. | ||
| #[must_use] | ||
| #[cold] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the cold here improve performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure, wanted see if it had an impact. Would be best to check as a standalone PR I think. Will remove it for now given that.
| { | ||
| struct DbGuard<'s> { | ||
| state: &'s Attached, | ||
| state: Option<&'s Attached>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to make this optional? It's not evident to me how it's related to the change (but it probably it is, just not evident to me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to track when the database actually changes which this helps in doing, since when it changes thats a signal for it having exited its outermost scope allowing us to reset the cancellation state.
Is "just continue the fixpoint until completion" a valid option here maybe? That is, can we notice whether a fix point is depending on us and in that case delay the unwind (or outright ignore it)? |
e0a056b to
207998d
Compare
207998d to
7ab946b
Compare
I think that would be tricky to do. It's easy in a single threaded context where you can walk the query stack and search for any query that has a non empty cycle heads array. But that won't work for multi threaded context because you'd have to walk the stack of all threads currently blocked on the current thread. But you might be able to get this information out of the I guess the ideal would be if we could add a shuttle test similar to the For now, the best we can do is probably something similar to the other panic tests where we run the test many times and hope for the best. I also think that this might work better with #1017 |
Introduces a
CancellationTokenthat can be used to cancel a specific database computation opposed to cancelling the whole runtime. This is traced by storing a second cancellation state onZalsaLocalitself opposed to the runtime cancel flag.When a thread gets cancelled, it will unwind as usual but with a different payload than pending write cancellation, threads blocked on such a cancelled thread will instead of propagating the cancellation run the computation they are blocked on themselves.
The cancellation state of the database gets reset when we exit the database TLS.