Skip to content

MAINT: Fix build failures and execution timeouts #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 17, 2021

Conversation

rossbar
Copy link
Collaborator

@rossbar rossbar commented Mar 16, 2021

This PR combines a plethora of tweaks and fixes to make the build/test/deploy infrastructure more reliable. The end result should be a reduction in CI failures due to things like resource requirements and execution timeouts.

Note this also includes some changes to the tutorials themselves, most notably the deep learning tutorials, to improve their usability in resource-constrained environments binder. Two changes to tutorial content need special attention:

  1. The evaluation step of the mnist tutorial has been vectorized. This has the advantage of demonstrating vectorization, but also significantly improves run-time performance.
  2. Reducing the number of episodes in the RL tutorial from 10 -> 3 (along with batch size). This is entirely a run-time argument: running the deep-rl tutorial in e.g. CI or binder would take on the order 10 minutes for the training step. Furthermore, since there is no tangible output from the training, I decided to reduce it further to make the tutorial more usable. @8bitmp3 your opinion on whether this is acceptable would be much appreciated.

Most of the other changes are related to minor build fixes (broken links, docs missing from toctrees, etc.) or additional CI infrastructure necessary for new dependencies (particularly atari-py). Once this PR is in we can also cache the tutorial dependencies, which will significantly reduce the build time as we can skip having to build the atari-py wheel from scratch every run. I'd also like to experiment with caching the tutorial execution during the build process, which depends on this as well.

@rossbar rossbar changed the title MAINT: Fix build failures related to pairing.md MAINT: Fix build failures from new tutorial additions Mar 16, 2021
@rossbar
Copy link
Collaborator Author

rossbar commented Mar 16, 2021

There's also an issue with the atari-py dependency - see #69. b7e8c3c is a temporary workaround for CI.

@rossbar rossbar changed the title MAINT: Fix build failures from new tutorial additions MAINT: Fix build failures and execution timeouts Mar 16, 2021
Copy link
Member

@melissawm melissawm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @rossbar ! Thanks!

@melissawm melissawm merged commit 8d0f765 into numpy:main Mar 17, 2021
@8bitmp3
Copy link
Contributor

8bitmp3 commented Mar 20, 2021

@rossbar that's some magic! Thank you 🙏

Reducing the number of episodes in the RL tutorial from 10 -> 3 (along with batch size). This is entirely a run-time argument: running the deep-rl tutorial in e.g. CI or binder would take on the order 10 minutes for the training step. Furthermore, since there is no tangible output from the training, I decided to reduce it further to make the tutorial more usable. @8bitmp3 your opinion on whether this is acceptable would be much appreciated.

The original 10 steps were too low but we may it clear here: "...let's limit the number of episodes for training to 3. If you are using hardware acceleration (CPUs and GPUs), you can increase the number to 1,000 or beyond."

The evaluation step of the mnist tutorial has been vectorized. This has the advantage of demonstrating vectorization, but also significantly improves run-time performance.

That's amazing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants