-
Notifications
You must be signed in to change notification settings - Fork 18
Description
If ones repetitively clones datasets that contains a docker image on the same machine, it will need to get the layers from the remote while it could already be pulled in the local docker.
For example, for parallel processing with fMRIPrep, one will clone the dataset as temp clone, but will need to get the docker layers of the fMRIPrep image, even if these are already present in the docker instance.
For now the datalad containers-run will run the image if present in docker but still tries fetching the layer, doing unnecessary costly get operations. For example, if the layers were dropped recklessly, it tries to get the layer, fails, but runs the container anyway.
$ datalad containers-run -n alpine
[INFO ] Making sure inputs are available (this may take some time)
get(error): .datalad/environments/alpine/image/31f5b0c484b3651f7c883d1be2ef442c2da71d103bc7ab20cd0b1f825e67e4e7/layer.tar (file) [not available; (Note that these git remotes have annex-ignore set: origin)]
get(impossible): .datalad/environments/alpine/image (directory) [could not get some content in /home/basile/data/tests/dataset_test/.datalad/environments/alpine/image ['/home/basile/data/tests/dataset_test/.datalad/environments/alpine/image/31f5b0c484b3651f7c883d1be2ef442c2da71d103bc7ab20cd0b1f825e67e4e7/layer.tar']]
[INFO ] == Command start (output follows) =====
/tmp $
/tmp $
[INFO ] == Command exit (modification check follows) =====
run(ok): /home/basile/data/tests/dataset_test (dataset) [python3 -m datalad_container.adapters.do...]
action summary:
get (error: 1, impossible: 1)
run (ok: 1)
save (notneeded: 1)
I guess it would require to change
| extra_inputs=[image_path], |
I can see 2 options:
- delegate the get of the image to the docker adapter, which would first check if the image is present in the local docker.
- add a cmdline flag to not add the image as an input to the run call (at the risks of failure).
On a side note, would it make sense to add the local docker service where the dataset is installed as a special remote.