-
Prefect v2.10.10 Is there a way to prevent multiple copies of the "same" task being run at the same time? I have multiple flows that are each composed of a variety of tasks. Some of these tasks are doing the exact same work, and I have set their cache keys appropriately. If I run the flows serially, one after another, then the duplicate tasks see the existing cache key, and no-ops. Yay Prefect! But, consider the following:
Is there any way to avoid this? What would be ideal is if Flow2 could see that TaskABC is currently being run by Flow1 and simply poll until it's complete, then run with the cached result. Is there a way to achieve this in the current system? Ideally something baked in (I don't think there is, but doesn't hurt to ask), or failing that, something manual using the client api? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Self-answering for benefit of future generations: Concurrency limits are the answer. It's messy, but what I'm doing is immediately before launching the task, I'm generating a unique identifier for the task (basically, the cache_key) and then making a POST to the server to set the concurrency limit to 1 for that identifier (n.b. most times it'll already be set, but it's the only way to be sure). Then I simply add the identifier as a tag on the task, and I'm done. Works exactly how I wanted - Flow2 is active, but Flow2/TaskABC doesn't get run until Flow1/TaskABC is done, at which point it sees the cached value and skips. The GUI could do with an indication that the task wants to run but is blocked due to concurrency (currently the live indicator simply sits there scrolling through empty space), but I feel churlish for even mentioning it. |
Beta Was this translation helpful? Give feedback.
Self-answering for benefit of future generations:
Concurrency limits are the answer. It's messy, but what I'm doing is immediately before launching the task, I'm generating a unique identifier for the task (basically, the cache_key) and then making a POST to the server to set the concurrency limit to 1 for that identifier (n.b. most times it'll already be set, but it's the only way to be sure).
Then I simply add the identifier as a tag on the task, and I'm done.
Works exactly how I wanted - Flow2 is active, but Flow2/TaskABC doesn't get run until Flow1/TaskABC is done, at which point it sees the cached value and skips. The GUI could do with an indication that the task wants to run but is …