You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is not possible to modify the task (e.g. update and upload a checkpoint) in a callback registered with Task.register_abort_callback
Trying to save a checkpoint in the callback gives the following error: 2024-09-13 12:12:27,581 - clearml.model - WARNING - Could not update last created model in Task b281b21329e3470ebc8959e831f28ff8, Task status 'stopped' cannot be updated
To reproduce
Register a callback on the current task using something like this:
defon_abort_callback() ->None:
print("Saving last checkpoint")
trainer.save_checkpoint(
self.last_filepath,
weights_only=self.save_weights_only,
)
# Ensure that the trainer stops gracefullytrainer.should_stop=Trueprint("Registering model checkpoint abort callback")
Task.current_task().register_abort_callback(on_abort_callback)
where trainer is a pytorch-lightning Trainer and the callback is registered in an extended lightning ModelCheckpoint (docs)
Expected behaviour
It should be possible to upload a model checkpoint to the ClearML server when a task is aborted in the abort callback function.
Current workaround is to mark the current task in_progress while saving checkpoint and then afterwards marking it stopped again. Not intuitive :-)
mads-oestergaard
changed the title
Tasks are already marked stopped when calling the callback from Task.register_abort_callback
Task is already marked stopped when calling the callback from Task.register_abort_callback
Sep 18, 2024
mads-oestergaard
changed the title
Task is already marked stopped when calling the callback from Task.register_abort_callback
Task is already marked stopped thecallback from Task.register_abort_callback is called
Sep 18, 2024
mads-oestergaard
changed the title
Task is already marked stopped thecallback from Task.register_abort_callback is called
Task is already marked stopped the callback from Task.register_abort_callback is called
Sep 18, 2024
mads-oestergaard
changed the title
Task is already marked stopped the callback from Task.register_abort_callback is called
Task is already marked stopped when the callback from Task.register_abort_callback is called
Sep 18, 2024
Describe the bug
It is not possible to modify the task (e.g. update and upload a checkpoint) in a callback registered with
Task.register_abort_callback
Trying to save a checkpoint in the callback gives the following error:
2024-09-13 12:12:27,581 - clearml.model - WARNING - Could not update last created model in Task b281b21329e3470ebc8959e831f28ff8, Task status 'stopped' cannot be updated
To reproduce
Register a callback on the current task using something like this:
where trainer is a pytorch-lightning Trainer and the callback is registered in an extended lightning ModelCheckpoint (docs)
Expected behaviour
It should be possible to upload a model checkpoint to the ClearML server when a task is aborted in the abort callback function.
Current workaround is to mark the current task
in_progress
while saving checkpoint and then afterwards marking itstopped
again. Not intuitive :-)Environment
Related Discussion
https://clearml.slack.com/archives/CTK20V944/p1726571061754989
The text was updated successfully, but these errors were encountered: