You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When task (with b2luigi.on_temporary_files has outputs A and B, and A is missing but B exists, the moving of the temporary files after the task had been successful fails if one of the outputs already exists.
Due to the nature of on_temporary_files, this doesn't happen naturally when running tasks because the final outputs can only exist if the task had been successful and thus all outputs should be there. However, this can happen when changing output definitions or when deleting some of the outputs but not others manually.
I solved this in my tasks by extending the process method with a clean-up step, but maybe this is something we want to fix in the moving of temporary files.
Or is this a behaviour a feature and we want the users to be explicitwhat to do with existing data to prevent the loss of it? After all, luigi itself does not clean-up of outputs either. Anyway the user uses a temporary wrapper, if they force-writes the output in the process-method, this will not help because he will only force-write the temporary file location, so I think we should fix that...
The text was updated successfully, but these errors were encountered:
Okay I just checked and the copying of outputs after success is not done by b2luigi but by the luigi temporary_path function, in particular the exception that the target exists is called in the function rename_dont_move.
So this seems to be an idea or conscious design-design from luigi. Does anyone experienced with luigi have an idea why they implemented it that way? Just wanted to ask here before opening a luigi issue/PR.
Luigi Tasks should not have multiple output files in theory. That is against the atomicity and idempotency of tasks (two properties you normally aim for in workflow managers). So yes, it is a design decision.
Typically, you would use a _SUCCESS file if this is you use-case after all files have been written (and only depend on it).
However, in practice these things do occur and fixing it seems like a plausible solution. Just wanted to give some background.
When task (with
b2luigi.on_temporary_files
has outputs A and B, and A is missing but B exists, the moving of the temporary files after the task had been successful fails if one of the outputs already exists.Due to the nature of
on_temporary_files
, this doesn't happen naturally when running tasks because the final outputs can only exist if the task had been successful and thus all outputs should be there. However, this can happen when changing output definitions or when deleting some of the outputs but not others manually.I solved this in my tasks by extending the
process
method with a clean-up step, but maybe this is something we want to fix in the moving of temporary files.Or is this a behaviour a feature and we want the users to be explicitwhat to do with existing data to prevent the loss of it? After all, luigi itself does not clean-up of outputs either. Anyway the user uses a temporary wrapper, if they force-writes the output in the
process
-method, this will not help because he will only force-write the temporary file location, so I think we should fix that...The text was updated successfully, but these errors were encountered: