Despite being fairly small, we have a somewhat complicated dependency situation due to the variety of different situations we want to support:
- MacOS + CPU - our main development environment.
- Linux + CUDA - our main production environment.
- Linux + CPU - continuous integration, etc.
- Testing dependencies (e.g. pytest, pre-commit) beyond the minimal installation necessary.
- Extra development dependencies (e.g. jupyter)
- Why not install all the development dependencies everywhere? It's slow! CI test runtime is down to 1.5 minutes when we only install the minimal necessary dependencies. Saving instance start-up time in cloud batch jobs saves annoyance and money!
- Why is python package management so bad??
- Reproducible environments --> dependency locking. We want to be able to precisely reproduce imprint output far into the future.
- Cross-platform dependency management - MacOS + Linux, CUDA + CPU. Jax + CUDA are annoying here.
- Automatic dependency updates. Dependabot and Renovatebot scan your repository for out-of-date pinned packages and submit automatic PRs updating those packages.
- Easy and fast installs. Mamba, pip and poetry are all fast enough.
- We use conda/mamba to install python and manage virtual environments. See
environment.yml
. - We use poetry to install critical dependencies: numpy, scipy, jax, etc. See
pyproject.toml
andpoetry.lock
. - If necessary, we use conda/mamba to install development tools that are more complex and benefit from an OS-level package manager.
- Docker provides an extra layer of locking and reproducibility, but we are currently not storing images for posterity.
Only step #2 is absolutely necessary.
- There's no good solution for dependency locking.
- Our most challenging package is jax and it is not a conda package!
- It doesn't help with the CUDA problem at all!
- Conda isn't supported by dependabot or renovatebot or other dependency updaters.
Conda and conda-forge are fantastic for installing lots of python-related packages that might also have non-Python parts or dependencies. This is especially useful for rapid development and "data science".
We're close to only needing poetry and in some situations, we don't need conda. The CI workflow doesn't use conda.
However, without conda, we would:
- Need an alternative way to install Python. This isn't a hard problem but conda/miniconda/mambaforge are very easy and I like them.
- Lose access to the large repository of easy-to-install packages that are often not-at-all-easy-to-install with pip/poetry. (e.g. sage!) This is not important for production jobs but it's nice to have this access for development.
- Need to spend time updating various parts of our cloud setup, mainly the Dockerfiles. Updating Dockerfiles is painful because the iteration time is so slow.
poetry self update
--> updatepoetry
itself.poetry lock
--> updatepoetry.lock
based onpyproject.toml
poetry update
--> update package versions according topyproject.toml
- Updating the packages in
pyproject.toml
is not supported by Poetry itself, but through a plugin.poetry up --with=test,cloud,dev
--> this will update the package versions inpyproject.toml
- The poetry plugin "poetry up"
- Lots of discussion in this issue
- JAX doesn't have a "normal" package registry and doesn't use PyPI
- Some discussion on using conda and poetry together
- Useful thoughts on different dependency managers
- conda and
conda env export
solves some of the dependency locking problem but has a lot of - conda-lock - tries to solve the package locking issue for conda. Doesn't work well for us because our annoying dependency is jax which is not conda: https://pythonspeed.com/articles/conda-dependency-management/
- environment.yml doesn't support platform selectors, but it is one of the most requested features. conda/conda#8089 https://stackoverflow.com/questions/32869042/is-there-a-way-to-have-platform-specific-dependencies-in-environment-yml
- pip-tools (pip-compile) - this partially solves the narrow issue of locking package versions but doesn't do it as well as poetry and doesn't have the broader package building/distribution benefits of poetry.
poetry config virtualenvs.create false
is useful in CI an Docker to use the base conda environment.