Conversation
… when task previously aborted or retired Signed-off-by: jorgee <jorge.ejarque@seqera.io>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
| if( task.failCount > 0 && task.config.getAttempt() != task.failCount + 1 ) { | ||
| task.config.attempt = task.failCount + 1 | ||
| task.resolve(taskBody) | ||
| } |
There was a problem hiding this comment.
This part is not currently required to fix the case of the issue.
However, when a task has been executed with previous failures but has not been completed, it is not cached, and it is re-executed. This execution is done with task.attempt = 1, with this code it is re-exceuted with task.attempt = failedCount +1.
This case is happening in the added test. A process is defined to fail in the first attempt and succeed for the rest. So, it should execute twice in total. In the test, the execution is aborted after the first retry. Without this code, the task will be reexecuted again twice (first fails and second succeeds). With this code, the previous failed task will be counted as an attempt and then the task runs only once.
I am not sure if reexecuting with attempt =1 was intended or if it should be managed as in this code and update the attempts according to cached failures. @bentsherman @pditommaso what's your opinion about it?
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Close #6884
Alternative for #6882
This pull request enhances task retry logic in Nextflow by improving how task failures and aborts are tracked and incorporated into cache key calculation. The changes ensure that both failure and abort counts are considered, making task resumption and retry behavior more robust and predictable.
Task retry and cache logic improvements:
failCountand a newabortedCountto ensure that retries after aborts or failures use a distinct hash.failCountorabortedCount) on theTaskRunobject are incremented, ensuring accurate retry attempts and cache key updates.Task state tracking enhancements:
abortedCountfield to theTaskRunclass to track the number of times a task execution has been aborted.isAborted()andisFailed()helper methods to theTraceRecordclass for clearer and more maintainable status checks.TODO: