Packaging and dependency management

Background

Despite being fairly small, we have a somewhat complicated dependency situation due to the variety of different situations we want to support:

MacOS + CPU - our main development environment.
Linux + CUDA - our main production environment.
Linux + CPU - continuous integration, etc.
Testing dependencies (e.g. pytest, pre-commit) beyond the minimal installation necessary.
Extra development dependencies (e.g. jupyter)

Why not install all the development dependencies everywhere? It's slow! CI test runtime is down to 1.5 minutes when we only install the minimal necessary dependencies. Saving instance start-up time in cloud batch jobs saves annoyance and money!
Why is python package management so bad??

What are our goals?

Reproducible environments --> dependency locking. We want to be able to precisely reproduce imprint output far into the future.
Cross-platform dependency management - MacOS + Linux, CUDA + CPU. Jax + CUDA are annoying here.
Automatic dependency updates. Dependabot and Renovatebot scan your repository for out-of-date pinned packages and submit automatic PRs updating those packages.
Easy and fast installs. Mamba, pip and poetry are all fast enough.

What we do most places

We use conda/mamba to install python and manage virtual environments. See environment.yml.
We use poetry to install critical dependencies: numpy, scipy, jax, etc. See pyproject.toml and poetry.lock.
If necessary, we use conda/mamba to install development tools that are more complex and benefit from an OS-level package manager.
Docker provides an extra layer of locking and reproducibility, but we are currently not storing images for posterity.

Only step #2 is absolutely necessary.

Why not use only conda?

There's no good solution for dependency locking.
Our most challenging package is jax and it is not a conda package!
It doesn't help with the CUDA problem at all!
Conda isn't supported by dependabot or renovatebot or other dependency updaters.

Why not use only poetry?

Conda and conda-forge are fantastic for installing lots of python-related packages that might also have non-Python parts or dependencies. This is especially useful for rapid development and "data science".

We're close to only needing poetry and in some situations, we don't need conda. The CI workflow doesn't use conda.

However, without conda, we would:

Need an alternative way to install Python. This isn't a hard problem but conda/miniconda/mambaforge are very easy and I like them.
Lose access to the large repository of easy-to-install packages that are often not-at-all-easy-to-install with pip/poetry. (e.g. sage!) This is not important for production jobs but it's nice to have this access for development.
Need to spend time updating various parts of our cloud setup, mainly the Dockerfiles. Updating Dockerfiles is painful because the iteration time is so slow.

How to update packages in poetry.lock and pyproject.toml?

poetry self update --> update poetry itself.
poetry lock --> update poetry.lock based on pyproject.toml
poetry update --> update package versions according to pyproject.toml
Updating the packages in pyproject.toml is not supported by Poetry itself, but through a plugin.
- poetry up --with=test,cloud,dev --> this will update the package versions in pyproject.toml
- The poetry plugin "poetry up"
- Lots of discussion in this issue

Notes

JAX doesn't have a "normal" package registry and doesn't use PyPI
Some discussion on using conda and poetry together
Useful thoughts on different dependency managers
conda and conda env export solves some of the dependency locking problem but has a lot of
conda-lock - tries to solve the package locking issue for conda. Doesn't work well for us because our annoying dependency is jax which is not conda: https://pythonspeed.com/articles/conda-dependency-management/
environment.yml doesn't support platform selectors, but it is one of the most requested features. conda/conda#8089 https://stackoverflow.com/questions/32869042/is-there-a-way-to-have-platform-specific-dependencies-in-environment-yml
pip-tools (pip-compile) - this partially solves the narrow issue of locking package versions but doesn't do it as well as poetry and doesn't have the broader package building/distribution benefits of poetry.
poetry config virtualenvs.create false is useful in CI an Docker to use the base conda environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

packaging.md

packaging.md

Packaging and dependency management

Background

What are our goals?

What we do most places

Why not use only conda?

Why not use only poetry?

How to update packages in poetry.lock and pyproject.toml?

Notes

Files

packaging.md

Latest commit

History

packaging.md

File metadata and controls

Packaging and dependency management

Background

What are our goals?

What we do most places

Why not use only conda?

Why not use only poetry?

How to update packages in poetry.lock and pyproject.toml?

Notes