-
-
Notifications
You must be signed in to change notification settings - Fork 353
Add message with debugging info to Cancelled #3256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3256 +/- ##
===================================================
+ Coverage 99.93727% 99.93774% +0.00047%
===================================================
Files 126 127 +1
Lines 19129 19274 +145
Branches 1296 1302 +6
===================================================
+ Hits 19117 19262 +145
Misses 10 10
Partials 2 2
🚀 New features to boost your workflow:
|
So far looking good, I didn't see anything immediately that I would suggest changing. Curious to hear about your immediate ideas about usecases for this |
see #3232 - if you're encountering a bunch of |
Fixed most of the problems, only remaining part is getting the right incantation for Also the error message can probably be improved |
I'll have a look, does fixing Outcome to delete its values fix it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really looking forward to using this, and to my colleagues having less to complain about when large complicated things get cancelled 😅
Many small comments below, but they're basically all tactical/local things, and I agree that once we get the gc stuff sorted out this will be basically ready to go. Thanks!
nope |
Okay I found a solution to gc+threads in 69c398b, not the prettiest but it seems to work at least? Very open to there being better ways of doing it
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor tweaks only; merge when ready!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some comments, mostly trivial/nitpicks. I haven't looked to the end yet since I don't immediately like some stuff I saw and I want to think about whether that's justified or not!
"KeyboardInterrupt", "deadline", "explicit", "nursery", "shutdown", "unknown" | ||
] | ||
# repr(Task), so as to avoid gc troubles from holding a reference | ||
source_task: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure repr(Task)
is actually that useful? Like yes, it says what function this was started in and maybe you can locate the task through some gc machinery through the id
, but...
Maybe a weakref
would be more useful so people can access attributes? I feel like "where was I spawned" is already answered by the stack trace. (nevermind, just remembered this is the canceller not the cancellee)
Nevermind, I didn't think this suggestion through. A weakref wouldn't work for the common case (a task cancelling a sibling task).
I'm not convinced a strong ref here would be bad -- a Task doesn't store the exception or the cancellation reason so there's no reference cycle I think? But a string here is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a Task
contains a CancelScope
and the scope gets cancelled within the same task, the scope will then have a strong ref to a CancelReason
which will then point back to the Task
. I think?
the repr(Task)
on its own is perhaps not super useful, but in case you have multiple cancellations going on at the same time that are only distinguished by the source task then you can visually distinguish them even without other sources of the task id.
Though it does also contain the name of the function itself:
<Task 'trio._core._tests.test_run.test_Cancelled_str' at 0x\w*>
which could be very helpful if you have different functions spawned in a nursery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess task -> parent/child nursery -> cancel scope -> cancel reason -> task is a loop, yeah. That's annoying. (Or maybe CoroutineType stores frames as a strongref? That too.)
@@ -1192,8 +1240,10 @@ def parent_task(self) -> Task: | |||
"(`~trio.lowlevel.Task`): The Task that opened this nursery." | |||
return self._parent_task | |||
|
|||
def _add_exc(self, exc: BaseException) -> None: | |||
def _add_exc(self, exc: BaseException, reason: CancelReason | None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the only callers of this are internal, IMO it would be cleaner to have them set the reason inline. Also, to avoid multiple comments for similar things, why doesn't this unconditionally set _cancel_reason = reason
if it isn't None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why doesn't this unconditionally set _cancel_reason = reason if it isn't None?
in case we get multiple sources of cancellation I don't want to override the first one. In other places it's more critical, but here I could see a scenario where:
- Something causes a cancellation, be it a deadline or a crashing task or whatever
- a different task B gets cancelled, but they have an
except Cancelled
, and inside that handler they raise a different exception - without
if self.cancel_scope._cancel_reason is None:
the cause would now get set to task B raising an exception
so I'm pretty sure we need the if
, which means we'd need to write the if
three times if we did it in-line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not relevant anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the code to see why this isn't relevant anymore, but I already typed up a response comment to this:
I'm not entirely convinced avoiding this is a good thing:
async def crasher():
await trio.sleep(2)
raise ValueError("...")
with trio.open_nursery() as nursery:
try:
nursery.start_soon(crasher)
try:
await trio.sleep(10) # app code
finally:
with trio.move_on_after(2, shield=True):
await trio.sleep(3) # cleanup code
except trio.Cancelled as c:
# what should c's cancel reason be
raise
This might matter for instance in code for shutting down stuff on exceptions, moving on after 2 seconds. The cancel reason if the clean up code ran over its 2 seconds would presumably (?) be that the exceptions happened, not that the timeout happened. I think it would make more sense if the reason was instead about the timeout.
(I haven't played with this PR yet so I'm not sure that's actually what will happen)
Would it make sense to establish some sort of causal mechanism? I.e. a field on CancelReason that points to the old CancelReason. (I guess Cancelled could store another Cancelled? But that might be bad for cycles.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah current behavior is that the crasher becomes the reason.
Storing a chain of Cancelled
sounds tricky and very likely to induce gc problems. I'm pretty sure we'd have to store any raised Cancelled
in the scope itself in order to be able to refer back to them.
... although the crash cancellation should be accessible somehow in the finally
scope to be set as __context__
. I wonder where that is getting lost
But storing a chain of reasons would be fairly straightforward and sounds like it might have some good use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I found a repro where sys.exc_info()
got cleared, but I might have mistaken myself and idr the repro anymore.
But going back to your example:
I have a pretty strong intuition that the reason the nursery scope is canceled is because a child crashed. The deadline is the reason the inner scope inside the finally
is canceled, but that cancellation will be swallowed by move_on_after
and even in a world where we stored a chain of reasons the nursery scope would never see the deadline cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point: what's the cancel reason visible inside the move_on_after then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it was deadline
, with the child-crashing cancelled in its __context__
, because of the shielding. I can add a test case for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, yeah that behavior sounds nice. Returning to the earliest response you have, is that (nursery cancel -> raise a different exception in one of the tasks) the only case where things try to overwrite the cancellation reason? If so, I think it would be nicer to make nurseries not try to cancel if they are already cancelled (which would prevent the cancellation reason from being overwritten).
I also see that changing the deadline can potentially overwrite. I don't see why that would try to cancel anything if things are already cancelled... I guess just code simplicity.
I guess it makes sense to try to handle it in one place with a check on the cancellation reason, then. I just don't like it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha. Yeah if we rewrote everything from scratch we might implement it differently.
But I think there's a bunch of ways to re-cancel, including simply calling cs.cancel(...)
multiple times
pypy fail is unrelated
|
I've seen that pypy failure before, and it usually means that it's using an outdated package cache. Clearing package cache will fix it (though I don't know how to do that myself) |
usually clicking the 🗑️ icons in https://github.com/python-trio/trio/actions/caches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine actually. Just a bunch of trivial comments.
@@ -1192,8 +1240,10 @@ def parent_task(self) -> Task: | |||
"(`~trio.lowlevel.Task`): The Task that opened this nursery." | |||
return self._parent_task | |||
|
|||
def _add_exc(self, exc: BaseException) -> None: | |||
def _add_exc(self, exc: BaseException, reason: CancelReason | None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, yeah that behavior sounds nice. Returning to the earliest response you have, is that (nursery cancel -> raise a different exception in one of the tasks) the only case where things try to overwrite the cancellation reason? If so, I think it would be nicer to make nurseries not try to cancel if they are already cancelled (which would prevent the cancellation reason from being overwritten).
I also see that changing the deadline can potentially overwrite. I don't see why that would try to cancel anything if things are already cancelled... I guess just code simplicity.
I guess it makes sense to try to handle it in one place with a check on the cancellation reason, then. I just don't like it!
src/trio/_core/_run.py
Outdated
raise Cancelled._create() | ||
|
||
self._attempt_abort(raise_cancel) | ||
self._attempt_abort(RaiseCancel(self._cancel_status._scope._cancel_reason)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this class, though I guess it's temporary if #3233 is merged....
I am confused about the exact justification, which probably makes sense but...:
I did not find a good way of binding a CancelReason or Cancelled object to raise_cancel as a function ...
Can't you make a cancel reason / cancelled be a nonlocal? I'm not sure what more binding you need.
I can see why it's not possible to have Cancelled created outside, but could self._cancel_status._scope._cancel_reason
be used? Why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this class, though I guess it's temporary if #3233 is merged....
I am confused about the exact justification, which probably makes sense but...:I did not find a good way of binding a CancelReason or Cancelled object to raise_cancel as a function ...
Can't you make a cancel reason / cancelled be a nonlocal? I'm not sure what more binding you need.
it's not strange that you're confused, because I also am :P I'm no expert on gc, so the current implementation is in large part from just trial & error.
I needed the reason/cancelled to be bound to the object passed to _attempt_abort
because it's (sometimes) passed onto another thread, and it's the thread that handles deleting it.
If I created it in _attempt_delivery_of_any_pending_cancel
then that function held a strong reference to the object which made tests fail, but it couldn't clean it up either because that could make it be deleted before other threads could see it.
Maybe there's a weakref solution in here, but I failed to get one to work.
But I'm totally open to there being a neat and elegant solution in here, or the gc tests being overly strict, and I'm just fumbling around too much to find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I hadn't looked closer at #3233, but that indeed does look like it'll simplify this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why it's not possible to have Cancelled created outside, but could
self._cancel_status._scope._cancel_reason
be used? Why not?
The code in _threads.py
does not have access to any CancelStatus
or scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _attempt_delivery_of_any_pending_cancel(self) -> None:
if self._abort_func is None:
return
if not self._cancel_status.effectively_cancelled:
return
def thing() -> NoReturn:
reason = self._cancel_status._scope._cancel_reason
if reason is None:
raise Cancelled._create(source="unknown", reason="misnesting")
else:
raise Cancelled._create(
source=reason.source,
reason=reason.reason,
source_task=reason.source_task,
)
self._attempt_abort(thing)
this passes a pytest
run (though I'm not running slow tests and I'm on Windows), so I think that might work? I removed the copy.copy
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does moving the reason = ...
statement out of the function work? Otherwise oh well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect it'll cause gc tests to fail, but I'll check before merging tmrw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a del
it indeed did work :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I meant more like:
def _attempt_delivery_of_any_pending_cancel(self) -> None:
if self._abort_func is None:
return
if not self._cancel_status.effectively_cancelled:
return
# moved reason to here:
reason = self._cancel_status._scope._cancel_reason
def thing() -> NoReturn:
if reason is None:
raise Cancelled._create(source="unknown", reason="misnesting")
else:
raise Cancelled._create(
source=reason.source,
reason=reason.reason,
source_task=reason.source_task,
)
self._attempt_abort(thing)
That way the callable keeps the CancelReason
alive even if self._cancel_status
dies. And since CancelReason
is just a plain dataclass it doesn't have any cycles or anything.
This is just about style though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am so confused about why that version works and not a ton of versions I thought were identical didn't, but oh well.
and task._runner.ki_pending | ||
): | ||
task._cancel_status._scope._cancel_reason = CancelReason( | ||
source="KeyboardInterrupt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of strange to me to set this here. Is there no way for the thing raising KeyboardInterrupt
to set this cancel reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a super ugly place.
KeyboardInterrupt
is raised in two places: in Task._attempt_delivery_of_pending_ki
where we ... probably could access self._parent_nursery.cancel_scope
to set the reason. But the other one is in _ki.py
KIManager.install()
and that one can't see shit. But if we want it to be analogous to Cancelled
that one doesn't set the reason on raising, it sets it when the scope is cancelled - and Cancelled
is only raised later when a checkpoint is actually hit. And #3233 would have to throw that away
Though the current logic doesn't make sense in that regard either, the scope has already effectively been canceled, but it's not until we checkpoint that we set the reason to KeyboardInterrupt
.
I'd love to place it in Runner.deliver_ki
, which is the one that actually sets ki_pending = True
... but navigating from there to the revelant scope and set a reason is non-trivial.
With the current setup there's even a case for saying we shouldn't bother setting a reason at all, because the scope hasn't actually been canceled. And during KI we don't raise Cancelled
, so we don't have a consumer for the CancelReason
. You can see it through introspection.. but that seems like a big stretch during KI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my current error when trying to do it in deliver_ki
is that setting self.main_task._cancel_status._scope._cancel_reason
errors because _cancel_status
can be None
... which confuses me because there's a comment saying it can only be None
in the init task.. but I'm explicitly accessing main_task
:S
But I'm starting to lean towards not setting the reason, though #3233 would have to revisit that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we can just leave the status quo and fix it in #3233 unless we expect the PRs to be in separate releases
src/trio/_core/_tests/test_cancel.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One missing test: how about a task that is stated via nursery.start
and raises before task_status.started()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh good catch, currently the reason becomes "Code block inside nursery contextmanager raised exception [...]" - which is quite misleading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I'm not sure how to fix that other than to just make the message more generic. There's not really any way of distinguishing the two cases, and the user might handle the exception that was generated in the start()
so can't add logic in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we catch an exception from the helper nursery and then if so, pre-emptively cancel with a better reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only sort-of-reasonable way I can see of doing it is to save the exception that is raised from start
, and then if __aexit__
has the same exception, we can infer that is the cause.
But because failing to start()
isn't actually a reason for cancellation, it can only ever be an indirect cause if that exception causes the CM block to exit, it's kind of weird to use it as the reason.
But even if the reason might be slightly misleading, I think the reasonable next step in debugging is for the developer to look up backtraces to find out what the exception is that killed the CM block, and they will then quickly find out what happened.
So thinking about it a bit more I'm not sure this is an issue in practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd vote to merge, and then revisit if anyone reports that this is an issue in practice.
… test. minor fixes
Failures are unrelated to this PR, 3.14b1 was released the other day which is presumably the cause. Not sure why Cython is failing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment above about something that maybe would fix the function, but it's alright if you don't (though I would prefer it). I think this looks fine before or after that fix.
Codecov ReportAll modified and coverable lines are covered by tests ✅
❌ Your project status has failed because the head coverage (99.93773%) is below the target coverage (100.00000%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #3256 +/- ##
===================================================
+ Coverage 99.93728% 99.93773% +0.00045%
===================================================
Files 126 127 +1
Lines 19132 19270 +138
Branches 1296 1302 +6
===================================================
+ Hits 19120 19258 +138
Misses 10 10
Partials 2 2
🚀 New features to boost your workflow:
|
fixes #3232 by adding a
CancelReason
that's tracked by bothCancelScope
andCancelStatus
and then used when constructing aCancelled
to be raised.Remaining stuff:
CancelScope
andCancelStatus
needs to have aCancelReason
member - initially onlyCancelStatus
had it but that wasn't enough, but I haven't tried to see if I can get away with onlyCancelScope
having it.CancelScope
cs.deadline=-inf
and then checkpointing somehow ends up as anexplicit
cancelled.from_thread_check_cancelled
needs some special handling