Skip to content

Conversation

@Veykril
Copy link
Member

@Veykril Veykril commented Oct 17, 2025

Introduces a CancellationToken that can be used to cancel a specific database computation opposed to cancelling the whole runtime. This is traced by storing a second cancellation state on ZalsaLocal itself opposed to the runtime cancel flag.

When a thread gets cancelled, it will unwind as usual but with a different payload than pending write cancellation, threads blocked on such a cancelled thread will instead of propagating the cancellation run the computation they are blocked on themselves.

The cancellation state of the database gets reset when we exit the database TLS.

@netlify
Copy link

netlify bot commented Oct 17, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 7ab946b
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/69031b0c7c42df0008c957a9

@codspeed-hq
Copy link

codspeed-hq bot commented Oct 17, 2025

CodSpeed Performance Report

Merging #1007 will degrade performances by 8.07%

Comparing Veykril:push-kwpwsmmosonq (7ab946b) with master (671c3dc)

Summary

❌ 4 (👁 2) regressions
✅ 9 untouched

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
👁 amortized[Input] 2.2 µs 2.3 µs -5.36%
amortized[InternedInput] 2.1 µs 2.3 µs -8.07%
👁 amortized[SupertypeInput] 2.8 µs 3 µs -7.34%
mutating[30] 14.3 µs 15 µs -4.36%

@Veykril
Copy link
Member Author

Veykril commented Oct 17, 2025

Looks like allocator noise from the new arc alloc

@MichaReiser
Copy link
Contributor

Interesting. Is the idea to cancel a single thread rather than all threads? Won't this cancell all threads that currently block on any query executing on the thread being cancelled?

@Veykril
Copy link
Member Author

Veykril commented Oct 17, 2025

Is the idea to cancel a single thread rather than all threads?

Yes, the thought is that this could allow implementing client side request cancellation for LSP requests for example.

Won't this cancell all threads that currently block on any query executing on the thread being cancelled?

That is the thing I will have to investigate. We unwind with the payload, we don't panic so I believe this does not set the panicking state of the thread actually. So this might actually just work (though I doubt it).

@MichaReiser
Copy link
Contributor

That is the thing I will have to investigate. We unwind with the payload, we don't panic so I believe this does not set the panicking state of the thread actually. So this might actually just work (though I doubt it).

Oh, so other threads would reclaim and re-execute the queries then. Interesting.

@Veykril
Copy link
Member Author

Veykril commented Oct 17, 2025

That would be the ideal I think

@Veykril Veykril force-pushed the push-kwpwsmmosonq branch 3 times, most recently from db715b7 to 47b7a7d Compare October 19, 2025 16:45
@Veykril
Copy link
Member Author

Veykril commented Oct 19, 2025

I was wrong, unwinding does set the panicking flag after all (which probably makes more sense ...) hmm

@Veykril Veykril force-pushed the push-kwpwsmmosonq branch 12 times, most recently from f97346e to de32cab Compare October 26, 2025 15:52
@Veykril Veykril marked this pull request as ready for review October 26, 2025 15:52
@Veykril Veykril force-pushed the push-kwpwsmmosonq branch 3 times, most recently from ba0f832 to 2318de4 Compare October 26, 2025 16:32
@Veykril Veykril requested a review from MichaReiser October 26, 2025 17:14
@MichaReiser
Copy link
Contributor

Would mind explaining the approach in the pr summary. I'm mainly interested in how it works if other threads are blocked on a query that gets cancelled (including if they participate in a cycle)

@Veykril Veykril force-pushed the push-kwpwsmmosonq branch 4 times, most recently from d2edc9b to e0ad0ff Compare October 28, 2025 20:58
@Veykril
Copy link
Member Author

Veykril commented Oct 29, 2025

Turns out I forgot to actually implemen the relevant retry part before 🤦 I did add the bool return type but didn't fix up the usages. To my surprise my test I added worked either way, which I think makes sense? Even if we don't retry if we don't propagate the panic we will just notice the value still missing in fetch cold and recompute again. Though I think this will cause two databases to compute the same query if they both were blocked. Should probably add a test for that.

Likewise I am not yet sure how to handle this for fix points yet (and need to cook up a test there as well)

Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to find time to think through what this means for fixpoint. What makes fixpoint different is that it creates and stores intermediate results and there's no way of reseting the database to its previous state. The other part that makes fixpoint "strange" is that "which query is called first" is meaningful because cycles are hit at different points depending on the entry query.

We do have a similar problem with panics but, after too much time spent debuging hangs, I decided to simply immediately panic if we see that a query participating in a cycle panicked before. That circumvents the entire: how do we clean up the intermediate state problem.

src/runtime.rs Outdated
///
/// Returns `true` if the computation was successful, and `false` if the other thread was cancelled.
#[must_use]
#[cold]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the cold here improve performance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure, wanted see if it had an impact. Would be best to check as a standalone PR I think. Will remove it for now given that.

{
struct DbGuard<'s> {
state: &'s Attached,
state: Option<&'s Attached>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to make this optional? It's not evident to me how it's related to the change (but it probably it is, just not evident to me)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to track when the database actually changes which this helps in doing, since when it changes thats a signal for it having exited its outermost scope allowing us to reset the cancellation state.

@Veykril
Copy link
Member Author

Veykril commented Oct 30, 2025

I have to find time to think through what this means for fixpoint. What makes fixpoint different is that it creates and stores intermediate results and there's no way of reseting the database to its previous state. The other part that makes fixpoint "strange" is that "which query is called first" is meaningful because cycles are hit at different points depending on the entry query.

Is "just continue the fixpoint until completion" a valid option here maybe? That is, can we notice whether a fix point is depending on us and in that case delay the unwind (or outright ignore it)?

@Veykril Veykril force-pushed the push-kwpwsmmosonq branch 2 times, most recently from e0a056b to 207998d Compare October 30, 2025 07:59
@MichaReiser
Copy link
Contributor

Is "just continue the fixpoint until completion" a valid option here maybe? That is, can we notice whether a fix point is depending on us and in that case delay the unwind (or outright ignore it)?

I think that would be tricky to do. It's easy in a single threaded context where you can walk the query stack and search for any query that has a non empty cycle heads array. But that won't work for multi threaded context because you'd have to walk the stack of all threads currently blocked on the current thread. But you might be able to get this information out of the DependencyGraph...

I guess the ideal would be if we could add a shuttle test similar to the deep_conditional where one of the threads (chocen randomely) get cancelled. Unfortunately, shuttle doesn't like panics... But I'd feel fairly confident about it if the shuttle tests pass because we had prove that it works (rather than speculating about a very hard problem 😅).

For now, the best we can do is probably something similar to the other panic tests where we run the test many times and hope for the best. I also think that this might work better with #1017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants