Skip to content

docker: do not get layers if present in local docker service #199

@bpinsard

Description

@bpinsard

If ones repetitively clones datasets that contains a docker image on the same machine, it will need to get the layers from the remote while it could already be pulled in the local docker.
For example, for parallel processing with fMRIPrep, one will clone the dataset as temp clone, but will need to get the docker layers of the fMRIPrep image, even if these are already present in the docker instance.

For now the datalad containers-run will run the image if present in docker but still tries fetching the layer, doing unnecessary costly get operations. For example, if the layers were dropped recklessly, it tries to get the layer, fails, but runs the container anyway.

$ datalad  containers-run -n alpine
[INFO   ] Making sure inputs are available (this may take some time) 
get(error): .datalad/environments/alpine/image/31f5b0c484b3651f7c883d1be2ef442c2da71d103bc7ab20cd0b1f825e67e4e7/layer.tar (file) [not available; (Note that these git remotes have annex-ignore set: origin)]
get(impossible): .datalad/environments/alpine/image (directory) [could not get some content in /home/basile/data/tests/dataset_test/.datalad/environments/alpine/image ['/home/basile/data/tests/dataset_test/.datalad/environments/alpine/image/31f5b0c484b3651f7c883d1be2ef442c2da71d103bc7ab20cd0b1f825e67e4e7/layer.tar']]
[INFO   ] == Command start (output follows) ===== 
/tmp $ 
/tmp $ 
[INFO   ] == Command exit (modification check follows) ===== 
run(ok): /home/basile/data/tests/dataset_test (dataset) [python3 -m datalad_container.adapters.do...]
action summary:                                                                                                                                           
  get (error: 1, impossible: 1)
  run (ok: 1)
  save (notneeded: 1)

I guess it would require to change

extra_inputs=[image_path],

I can see 2 options:

  • delegate the get of the image to the docker adapter, which would first check if the image is present in the local docker.
  • add a cmdline flag to not add the image as an input to the run call (at the risks of failure).

On a side note, would it make sense to add the local docker service where the dataset is installed as a special remote.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dockerIssues relating to docker support

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions