-
Notifications
You must be signed in to change notification settings - Fork 0
Description
During testing by bioinformaticians, they've raised multiple inquiries about the status of running workflows - It's been running for 4 days, is it stuck? What step is it on? When will it be done?
Currently, bespin/lando report a handful of high-level job states: creating a vm, downloading data, running workflow, uploading data, terminating VM. These are visible in the API, UI, and CLI, but don't answer the question of how far along the workflow is or what to expect.
We (IT) can login to the VMs and tail a logfile or look at the processes/docker containers running to answer these questions. This answers the question in the short-term, but it's not a viable long-term strategy.
We considered some approaches based on the worker log files (See Duke-GCB/lando#46 for background), but don't want to implement this in a way that depends on a specific CWL engine.
Ideally we could report around a semantic unit like a "sample", rather than a file or workflow step.
Thoughts?