Skip to content

Calculations in key-figures.ipynb take too long #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
r-ford opened this issue Jul 1, 2022 · 12 comments · Fixed by #16
Closed

Calculations in key-figures.ipynb take too long #3

r-ford opened this issue Jul 1, 2022 · 12 comments · Fixed by #16
Assignees
Labels
bug Issues that present a reasonable conviction there is a reproducible bug. medium priority Medium priority issue

Comments

@r-ford
Copy link
Member

r-ford commented Jul 1, 2022

Similar to this issue in the CMIP6 Cookbook, key-figures.ipynb is too much for GitHub Actions to build, so the cell outputs are not included. In this case, though, I can't just cut out some material.

I'm not sure that Dask is actually implemented properly in this notebook, so a solution to this issue could just be using Dask to speed up the computation.

@r-ford r-ford added the bug Issues that present a reasonable conviction there is a reproducible bug. label Jul 1, 2022
@brian-rose
Copy link
Member

I doubt that we'll be able to use Dask (or at least use it effectively) on GitHub Actions.

It would be good to figure out why those cells are not running. According to the docs here:
https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits
each job should have a maximum of 6 hours execution time.

@brian-rose
Copy link
Member

@r-ford do you want to take a look at updating the infrastructure in this repo (ProjectPythia/cookbook-gallery#96) and see if executing on the binder will fix this?

If so we can also close ProjectPythia/cookbook-gallery#12

@r-ford
Copy link
Member Author

r-ford commented Nov 9, 2022

@brian-rose I just updated the infrastructure, but it looks like there is an error with the code in one of the notebooks:

CellExecutionError: An error occurred while executing the following cell:
------------------
uniques = col.unique(columns=["component", "frequency", "experiment", "variable"])
pprint.pprint(uniques, compact=True, indent=4)
------------------

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [6], line 1
----> 1 uniques = col.unique(columns=["component", "frequency", "experiment", "variable"])
      2 pprint.pprint(uniques, compact=True, indent=4)

TypeError: unique() got an unexpected keyword argument 'columns'

Maybe some of this code is outdated?

@brian-rose
Copy link
Member

@r-ford ok, I guess we'll have to take a deeper dive into the code. Seems like some sort of conflict with the intake_esm catalog.

@brian-rose
Copy link
Member

We'll try to get this running before the AGU meeting.

@ktyle
Copy link
Contributor

ktyle commented Dec 6, 2022

@brian-rose @r-ford @clyne I am making some progress fixing the problematic code cells (the example-workflows/key-figures.ipynb notebook runs to completeion now) ... but the issue remains that the notebooks take extremely long to run (on our amped-up department server, there are several cells that each take ~25 minutes to complete, and that's with Dask).

@brian-rose
Copy link
Member

Hmmm there appears to be a new Pangeo binder running on AWS: https://hub.aws-uswest2-binder.pangeo.io/

I wonder if that's a better platform for running this code, since the data are in S3 storage.

@brian-rose
Copy link
Member

There's an authentication step that uses GitHub user account. I'm not sure what will happen if we just set

binderhub_url: https://hub.aws-uswest2-binder.pangeo.io

in the config file.

@ktyle
Copy link
Contributor

ktyle commented Dec 7, 2022

@brian-rose Given that (at present) we don't know how to leverage binderbot to use authentication, and the fact that this notebook takes north of an hour to run, I think it would be better to present this notebook as HTML, in its fully-rendered format after it has successfully run locally, but not have it run via github actions or binderbot. This way we would at least be able to include it in the cookbook.

@r-ford
Copy link
Member Author

r-ford commented Dec 8, 2022

@brian-rose Given that (at present) we don't know how to leverage binderbot to use authentication, and the fact that this notebook takes north of an hour to run, I think it would be better to present this notebook as HTML, in its fully-rendered format after it has successfully run locally, but not have it run via github actions or binderbot. This way we would at least be able to include it in the cookbook.

I think this makes sense for now. If we're expecting some more readers soon, it would be nice if they could at least see the output.

@brian-rose
Copy link
Member

@ktyle @r-ford agreed that committing a pre-executed version of this notebook is an acceptable short-term solution, given that it has useful content and that we are actively pursuing solutions for reproducible health checks etc.

I left some notes in #12 about how to implement this and get the whole Cookbook successfully building and publishing itself.

@ktyle
Copy link
Contributor

ktyle commented May 8, 2023

We will revisit when the new Binderhub is up and running

@ktyle ktyle removed the high priority High priority issue label May 8, 2023
@brian-rose brian-rose added the medium priority Medium priority issue label May 8, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in Pythia Projects Board Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues that present a reasonable conviction there is a reproducible bug. medium priority Medium priority issue
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants